| OLTP | OLAP | |
|---|---|---|
| In-Process | ||
| Stand-Alone |
sf loads entire datasets into memory → bottleneck with large datast_intersects, st_area, …)dplyr, data.table, arrow do relational transformations (select, join, aggregate, …)| OLAP (Analytical) | OLTP (Transactional) | |
|---|---|---|
| Workload | Read-mostly | Many small writes |
| Queries | Complex, scan large parts | Simple, touch individual rows |
| Updates | Bulk appends | Frequent row-level updates |
| OLTP | OLAP | |
|---|---|---|
| In-Process | ||
| Stand-Alone |
wald-kantone.duckdb from Moodle (forest + canton boundaries)WHERE st_intersects(a, b) checks every pair → slowVIEWsVIEWsVIEW = a named SQL query (virtual table)VIEW: a subset of the forest dataset (for faster iteration)VIEW:CREATE VIEW name AS to any SELECT…
w and k are aliases…
┌──────────────────────┬───────────────────────────────────────────────────────┐
│ name │ st_intersection(w.geom, k.geom) │
│ varchar │ geometry │
├──────────────────────┼───────────────────────────────────────────────────────┤
│ Genève │ POLYGON EMPTY │
│ Genève │ POLYGON EMPTY │
│ Genève │ POLYGON EMPTY │
│ Genève │ POLYGON EMPTY │
│ Genève │ POLYGON EMPTY │
│ · │ · │
│ · │ · │
│ · │ · │
│ Appenzell Innerrho… │ POLYGON Z ((2752421.3200000883 1239807.9399965794 9… │
│ Appenzell Innerrho… │ POLYGON EMPTY │
│ Appenzell Innerrho… │ POLYGON EMPTY │
│ Appenzell Innerrho… │ POLYGON EMPTY │
│ Appenzell Innerrho… │ POLYGON EMPTY │
├──────────────────────┴───────────────────────────────────────────────────────┤
│ 26000 rows (10 shown) 2 columns │
└──────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 1.292 user 1.293885 sys 0.005016
WHERE to filter early:SELECT
name,
st_intersection(w.geom, k.geom),
FROM wald2 w, kantone k
1WHERE st_intersects(w.geom, k.geom);┌──────────────────────┬───────────────────────────────────────────────────────┐
│ name │ st_intersection(w.geom, k.geom) │
│ varchar │ geometry │
├──────────────────────┼───────────────────────────────────────────────────────┤
│ Appenzell Innerrho… │ POLYGON Z ((2750577.8690000786 1236499.1299968604 1… │
│ St. Gallen │ POLYGON Z ((2733301.3710000697 1235901.1909968755 1… │
│ St. Gallen │ POLYGON Z ((2730828.668000072 1236533.6979968112 10… │
│ Appenzell Innerrho… │ POLYGON Z ((2751117.7790000793 1236511.4949968604 1… │
│ Zürich │ POLYGON Z ((2712200.8850000626 1236517.2959967866 6… │
│ · │ · │
│ · │ · │
│ · │ · │
│ Appenzell Innerrho… │ POLYGON Z ((2752421.3200000883 1239807.9399965794 9… │
│ Zürich │ POLYGON Z ((2684156.710000051 1239693.0049964704 51… │
│ St. Gallen │ POLYGON Z ((2734307.2300000787 1239757.5959965463 1… │
│ Appenzell Ausserrh… │ POLYGON Z ((2735958.5650000805 1239767.1879965463 1… │
│ Appenzell Ausserrh… │ POLYGON Z ((2738798.7040000805 1239749.8799965512 8… │
├──────────────────────┴───────────────────────────────────────────────────────┤
│ 1027 rows (10 shown) 2 columns │
└──────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 0.818 user 0.823316 sys 0.006000st_intersects vs st_intersectionThis query uses both: st_intersects in WHERE = predicate (true/false filter), st_intersection in SELECT = operation (computes geometry). Same pattern as in R/sf!
SELECT
name,
1 st_area(st_intersection(w.geom, k.geom)) as wald_area,
FROM wald2 w, kantone k
WHERE st_intersects(w.geom, k.geom);st_area calculates the area of the intersection
┌────────────────────────┬────────────────────┐
│ name │ wald_area │
│ varchar │ double │
├────────────────────────┼────────────────────┤
│ Appenzell Innerrhoden │ 3223.381481670876 │
│ St. Gallen │ 490040.98548061884 │
│ St. Gallen │ 1515.7608568453097 │
│ Appenzell Innerrhoden │ 5697.7576364166425 │
│ Zürich │ 9248.165822466484 │
│ · │ · │
│ · │ · │
│ · │ · │
│ Appenzell Innerrhoden │ 31971.908096492414 │
│ Zürich │ 11339.135541535818 │
│ St. Gallen │ 11570.958247649043 │
│ Appenzell Ausserrhoden │ 3659.7222327571385 │
│ Appenzell Ausserrhoden │ 1371.8734574811795 │
├────────────────────────┴────────────────────┤
│ 1027 rows (10 shown) 2 columns │
└─────────────────────────────────────────────┘
Run Time (s): real 0.810 user 0.811628 sys 0.018473VIEW before aggregating:1CREATE VIEW wald_kantone AS
SELECT
name,
st_area(st_intersection(w.geom, k.geom)) AS wald_area,
FROM wald2 w, kantone k
WHERE st_intersects(w.geom, k.geom);VIEW from the preceding query
VIEW like a table, then aggregate with GROUP BY:GROUP BY in a SQL query…
VIEW:kantone to get the total canton area and compute the fraction:LEFT JOIN appends columns from another table…
ON condition
wald_area / area
VIEW, ordered by forest share:CREATE VIEW kanton_frac AS
SELECT
kantone.name,
wald_area/area as waldanteil,
FROM wald_kanton_grp
LEFT JOIN kantone
ON wald_kanton_grp.name=kantone.name
1ORDER BY waldanteil DESC;ORDER BY to show us the highest values first
VIEWs, switching to the full dataset is trivial:VIEW now automatically uses the full data:VIEW vs CREATE TABLE ... ASVIEW = lazy (re-executed on every access)CREATE TABLE ... AS = materialized (stored on disk):1CREATE TABLE wald_kantone_mat AS
SELECT
name,
st_area(st_intersection(w.geom, k.geom)) AS wald_area
FROM wald2 w, kantone k
WHERE st_intersects(w.geom, k.geom);VIEW: