Skip to content

Commit

Permalink
Minor text edit while reading massive chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
hugoledoux committed Jan 7, 2025
1 parent 1e745f7 commit 58adcf9
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 8 deletions.
19 changes: 11 additions & 8 deletions massive/massive.tex
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ \chapter{Handling and processing massive terrains}%
Examples of massive datasets:
\begin{enumerate}
\item the point cloud dataset of a \qty{1.5}{km^2} of Dublin\sidenote{\url{https://bit.ly/32GXiFq}} contains around 1.4 billion points (density of \qty{300}{pts/m^2}), which was collected with airborne laser scanners;
\item the lidar dataset of the whole of the Netherlands (AHN) has about \qty{10}{pts/m^2} and its latest version (AHN4) has more than 900 billion points;
\item the lidar dataset of the whole of the Netherlands (AHN\sidenote{\url{https://www.ahn.nl/}}) has about \qty{10}{pts/m^2} and its latest version (AHN4) has more than 900 billion points;
\item the global digital surface model \emph{ALOS World 3D---30m (AW3D30)}\sidenote{\url{https://www.eorc.jaxa.jp/ALOS/en/dataset/aw3d30/aw3d30_e.htm}} is a raster dataset with a resolution of \ang{;;1}. Thus we have about \num{8.4d11} pixels.
\end{enumerate}

Expand All @@ -34,11 +34,11 @@ \chapter{Handling and processing massive terrains}%

What is ironic is that while datasets like those above are being collected in several countries, in practice they are seldom used directly since the tools that practitioners have, and are used to, usually cannot handle such massive datasets.
Instead of the raw point clouds, gridded terrains are often derived (for example with a \qty{50}{cm} resolution), because those are easier to process with a personal computer.
Indeed, the traditional GISs and terrain modelling tools are limited by the main memory of computers: if a dataset is bigger then operations will be very slow, and will most likely not finish (and thus crash).
Indeed, the traditional GISs and terrain modelling tools are limited by the main memory of computers: if a dataset is bigger then operations will be very slow, and will most likely not finish (or even crash).

%

This chapter discusses one method to visualise massive raster terrains, one to index point clouds (for fast retrieval of neighbours, useful for several processing of points), and one to construct massive Delaunay triangulations (and potentially process them).
This chapter discusses one method to visualise and potentially analyse massive raster terrains, one to index point clouds (for fast retrieval of neighbours, useful for several processing of points), and one to construct massive Delaunay triangulations (and potentially process them).



Expand Down Expand Up @@ -71,7 +71,7 @@ \section{Raster pyramids}%
Usually we downsample the resolution by a factor 2,%
\index{downsampling}\marginnote{downsampling}
\ie\ if we have $x$ columns and $y$ rows the first copy will have a size ($\frac{x}{2}$, $\frac{y}{2}$), the second ($\frac{x}{4}$, $\frac{y}{4}$), and so on (the number of images is arbitrary and defined by the user).
Notice that the extra storage will be maximum $\frac{1}{3}$ of the original raster: the first pyramid is $\frac{1}{4}$, the second $\frac{1}{16}$, the third $\frac{1}{64}$, etc.
Notice that the extra storage will be maximum about $\frac{1}{3}$ of the original raster: the first pyramid is $\frac{1}{4}$, the second $\frac{1}{16}$, the third $\frac{1}{64}$, etc.

%

Expand Down Expand Up @@ -122,7 +122,7 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]%
As shown in \reffig{fig:kdtree}, a $k$d-tree is a binary tree%
\index{binary tree}\marginnote{binary tree}
(thus each node has a maximum of 2 children, if any), and the main idea is that each level of the tree compares against one specific dimension.
We `cycle through' the dimensions as we walk down the levels of the tree.
We ``cycle through'' the dimensions as we walk down the levels of the tree.

%

Expand Down Expand Up @@ -165,7 +165,6 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]%
\labfig{fig:kdtree_insert}
\end{figure}
illustrates this for one point.

Observe that this insertion renders the tree unbalanced.
Methods to balance a $k$d-tree exists but are out of scope for this book.

Expand Down Expand Up @@ -305,7 +304,9 @@ \section{Indexing points in 3D space with the kd-tree}[kd-tree]%
%

The streaming paradigm can be used to process geometries (points, meshes, polygons, etc.) but it is slightly more involved than for a simple video.
Since the First Law of Geography of Tobler stipulates that ``everything is related to everything else, but near things are more related than distant things''\sidenote{Tobler W., (1970) \emph{A computer movie simulating urban growth in the Detroit region}. Economic Geography, 46(Supplement):234–240}, if we wanted to calculate the slope at one location in a point cloud we would need to retrieve all the neighbouring points and potentially calculate locally the DT\@.
Since the First Law of Geography of Tobler stipulates that ``everything is related to everything else, but near things are more related than distant things'',
\marginnote{\citet{Tobler70}}
if we wanted to calculate the slope at one location in a point cloud we would need to retrieve all the neighbouring points and potentially calculate locally the DT\@.
The question is: is it possible to do this without reading the whole file and only process one part of it?

%
Expand Down Expand Up @@ -392,7 +393,9 @@ \subsection{Spatial coherence}
The construction of a DT with the streaming paradigm will only succeed (in the sense that the memory footprint will stay relatively low) if the \emph{spatial coherence}%
\index{spatial coherence}
of the input dataset is high.
It is defined by \sidecitet{Isenburg06} as: ``a correlation between the proximity in space of geometric entities and the proximity of their representations in [the file]''.
It is defined by \citeauthor{Isenburg06} as:
\marginnote{\citet{Isenburg06}}
``a correlation between the proximity in space of geometric entities and the proximity of their representations in [the file]''.
They demonstrate that real-world point cloud datasets often have natural spatial coherence because the points are usually stored in the order they were collected.
If we shuffled randomly the points in an input file, then the spatial coherence would be very low and the finalisation tags in the stream coming out of the finaliser would be located at the end (and not distributed in the stream).

Expand Down
12 changes: 12 additions & 0 deletions refs/tb.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1569,3 +1569,15 @@ @book{worboys04
publisher = {CRC Press},
title = {{GIS}: {A} computing perspective},
year = {2004}}


@article{Tobler70,
author = {Tobler, Waldo},
journal = {Economic Geography},
number = {2},
pages = {234--240},
title = {{A} computer movie simulating urban growth in the {Detroit} region},
volume = {46},
year = {1970}}


0 comments on commit 58adcf9

Please sign in to comment.