From 58adcf966500fa4691757c60a2a03fa6f68d459c Mon Sep 17 00:00:00 2001 From: Hugo Ledoux Date: Tue, 7 Jan 2025 16:49:20 +0100 Subject: [PATCH] Minor text edit while reading massive chapter --- massive/massive.tex | 19 +++++++++++-------- refs/tb.bib | 12 ++++++++++++ 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/massive/massive.tex b/massive/massive.tex index d44aa6a..e3808fa 100644 --- a/massive/massive.tex +++ b/massive/massive.tex @@ -19,7 +19,7 @@ \chapter{Handling and processing massive terrains}% Examples of massive datasets: \begin{enumerate} \item the point cloud dataset of a \qty{1.5}{km^2} of Dublin\sidenote{\url{https://bit.ly/32GXiFq}} contains around 1.4 billion points (density of \qty{300}{pts/m^2}), which was collected with airborne laser scanners; - \item the lidar dataset of the whole of the Netherlands (AHN) has about \qty{10}{pts/m^2} and its latest version (AHN4) has more than 900 billion points; + \item the lidar dataset of the whole of the Netherlands (AHN\sidenote{\url{https://www.ahn.nl/}}) has about \qty{10}{pts/m^2} and its latest version (AHN4) has more than 900 billion points; \item the global digital surface model \emph{ALOS World 3D---30m (AW3D30)}\sidenote{\url{https://www.eorc.jaxa.jp/ALOS/en/dataset/aw3d30/aw3d30_e.htm}} is a raster dataset with a resolution of \ang{;;1}. Thus we have about \num{8.4d11} pixels. \end{enumerate} @@ -34,11 +34,11 @@ \chapter{Handling and processing massive terrains}% What is ironic is that while datasets like those above are being collected in several countries, in practice they are seldom used directly since the tools that practitioners have, and are used to, usually cannot handle such massive datasets. Instead of the raw point clouds, gridded terrains are often derived (for example with a \qty{50}{cm} resolution), because those are easier to process with a personal computer. -Indeed, the traditional GISs and terrain modelling tools are limited by the main memory of computers: if a dataset is bigger then operations will be very slow, and will most likely not finish (and thus crash). +Indeed, the traditional GISs and terrain modelling tools are limited by the main memory of computers: if a dataset is bigger then operations will be very slow, and will most likely not finish (or even crash). % -This chapter discusses one method to visualise massive raster terrains, one to index point clouds (for fast retrieval of neighbours, useful for several processing of points), and one to construct massive Delaunay triangulations (and potentially process them). +This chapter discusses one method to visualise and potentially analyse massive raster terrains, one to index point clouds (for fast retrieval of neighbours, useful for several processing of points), and one to construct massive Delaunay triangulations (and potentially process them). @@ -71,7 +71,7 @@ \section{Raster pyramids}% Usually we downsample the resolution by a factor 2,% \index{downsampling}\marginnote{downsampling} \ie\ if we have $x$ columns and $y$ rows the first copy will have a size ($\frac{x}{2}$, $\frac{y}{2}$), the second ($\frac{x}{4}$, $\frac{y}{4}$), and so on (the number of images is arbitrary and defined by the user). -Notice that the extra storage will be maximum $\frac{1}{3}$ of the original raster: the first pyramid is $\frac{1}{4}$, the second $\frac{1}{16}$, the third $\frac{1}{64}$, etc. +Notice that the extra storage will be maximum about $\frac{1}{3}$ of the original raster: the first pyramid is $\frac{1}{4}$, the second $\frac{1}{16}$, the third $\frac{1}{64}$, etc. % @@ -122,7 +122,7 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]% As shown in \reffig{fig:kdtree}, a $k$d-tree is a binary tree% \index{binary tree}\marginnote{binary tree} (thus each node has a maximum of 2 children, if any), and the main idea is that each level of the tree compares against one specific dimension. -We `cycle through' the dimensions as we walk down the levels of the tree. +We ``cycle through'' the dimensions as we walk down the levels of the tree. % @@ -165,7 +165,6 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]% \labfig{fig:kdtree_insert} \end{figure} illustrates this for one point. - Observe that this insertion renders the tree unbalanced. Methods to balance a $k$d-tree exists but are out of scope for this book. @@ -305,7 +304,9 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]% % The streaming paradigm can be used to process geometries (points, meshes, polygons, etc.) but it is slightly more involved than for a simple video. -Since the First Law of Geography of Tobler stipulates that ``everything is related to everything else, but near things are more related than distant things''\sidenote{Tobler W., (1970) \emph{A computer movie simulating urban growth in the Detroit region}. Economic Geography, 46(Supplement):234–240}, if we wanted to calculate the slope at one location in a point cloud we would need to retrieve all the neighbouring points and potentially calculate locally the DT\@. +Since the First Law of Geography of Tobler stipulates that ``everything is related to everything else, but near things are more related than distant things'', +\marginnote{\citet{Tobler70}} +if we wanted to calculate the slope at one location in a point cloud we would need to retrieve all the neighbouring points and potentially calculate locally the DT\@. The question is: is it possible to do this without reading the whole file and only process one part of it? % @@ -392,7 +393,9 @@ \subsection{Spatial coherence} The construction of a DT with the streaming paradigm will only succeed (in the sense that the memory footprint will stay relatively low) if the \emph{spatial coherence}% \index{spatial coherence} of the input dataset is high. -It is defined by \sidecitet{Isenburg06} as: ``a correlation between the proximity in space of geometric entities and the proximity of their representations in [the file]''. +It is defined by \citeauthor{Isenburg06} as: +\marginnote{\citet{Isenburg06}} +``a correlation between the proximity in space of geometric entities and the proximity of their representations in [the file]''. They demonstrate that real-world point cloud datasets often have natural spatial coherence because the points are usually stored in the order they were collected. If we shuffled randomly the points in an input file, then the spatial coherence would be very low and the finalisation tags in the stream coming out of the finaliser would be located at the end (and not distributed in the stream). diff --git a/refs/tb.bib b/refs/tb.bib index 90e8350..df878b7 100644 --- a/refs/tb.bib +++ b/refs/tb.bib @@ -1569,3 +1569,15 @@ @book{worboys04 publisher = {CRC Press}, title = {{GIS}: {A} computing perspective}, year = {2004}} + + +@article{Tobler70, + author = {Tobler, Waldo}, + journal = {Economic Geography}, + number = {2}, + pages = {234--240}, + title = {{A} computer movie simulating urban growth in the {Detroit} region}, + volume = {46}, + year = {1970}} + +