Skip to content

Commit

Permalink
dev and perf docs (#62)
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebarron authored Oct 9, 2023
1 parent 79c49bf commit 43302fe
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 0 deletions.
63 changes: 63 additions & 0 deletions DEVELOP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Developer Documentation

## Python

This project uses [Poetry](https://python-poetry.org/) to manage Python dependencies.

After installing Poetry, run

```
poetry install
```

to install all dependencies.

To register the current Poetry-managed Python environment with JupyterLab, run

```
poetry run python -m ipykernel install --user --name "lonboard"
```

JupyterLab is an included dev dependency, so to start JupyterLab you can run

```
poetry run jupyter lab
```

Then you should see a tile on the home screen that lets you open a Jupyter Notebook in the `lonboard` environment. You should also be able to open up an example notebook from the `examples/` folder.

## JavaScript

The JavaScript dependencies are managed in `package.json` and tracked with Yarn or NPM (I haven't been consistent at using one or the other :sweat_smile:).

ESBuild is used for bundling into an ES Module that the Jupyter Widget loads at runtime. The ESBuild configuration is in `build.mjs`. You can run the script with

```
yarn build
```

I often run

```
fswatch -o src | xargs -n1 -I{} yarn build
```

to watch the `src` directory and run `yarn build` anytime it changes.

Currently, each Python model (the `ScatterplotLayer`, `PathLayer`, and `SolidPolygonLayer` classes) use _their own individual JS entry points_. You can inspect this with the `_esm` key on each class, which is used by anywidget to load in the widget. The ESBuild script converts `scatterplot-layer.tsx`, `path-layer.tsx`, and `solid-polygon-layer.tsx` into bundles used by each class, respectively.

Anywidget and its dependency ipywidgets handles the serialization from Python into JS, automatically keeping each side in sync.

## Documentation website

The documentation website is generated with `mkdocs` and [`mkdocs-material`](https://squidfunk.github.io/mkdocs-material). After `poetry install`, you can serve the docs website locally with

```
poetry run mkdocs serve
```

and you can publish the docs to Github Pages with

```
poetry run mkdocs gh-deploy
```
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# lonboard

Python library for extremely fast geospatial data visualization in Jupyter.

![](docs/img/scatterplot-layer-network-speeds.jpg)

## Install
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# lonboard
53 changes: 53 additions & 0 deletions docs/performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Performance

Performance is a critical goal of lonboard. Below are a couple pieces of information you should know to understand lonboard's performance characteristics, as well as some advice for how to get the best performance.

## Performance Characteristics

There are two distinct parts to the performance of **lonboard**: one is the performance of transferring data to the browser and the other is the performance of rendering the data once it's there.

In general, these parts are completely distinct. Even if it takes a while to load the data in your browser, the map might be snappy once it loads, and vice versa.

### Data Transfer

Lonboard creates an interactive visualization of your data in your browser. In order to do this, your GeoDataFrame needs to be transferred from your Python environment to your browser.

In the case where your Python session is running locally (on the same machine as your browser), this data transfer is extremely fast: less than a second in most cases.

However, in the case where your Python session is running on a remote server (such as [Google Colab](https://colab.research.google.com/), [Binder](https://mybinder.readthedocs.io/en/latest/introduction.html), or a JupyterHub instance), this data transfer means **downloading the data to your local browser**. Therefore, when running lonboard from a remote server, your internet speed and the quantity of data you pass into a layer will have large impacts on the data transfer speed.

Under the hood, lonboard uses efficient compression (in the form of [GeoParquet](https://geoparquet.org/)) to transfer data to the browser, but compression can only do so much; the data still needs to be downloaded.

### Rendering Performance

Once the data has been transfered from your Python session to your browser, it needs to be rendered.

The biggest thing to note is that — in contrast to projects like [datashader](https://datashader.org/) — lonboard **does not minimize the amount of data being rendered**. If you pass a GeoDataFrame with 10 million coordinates, lonboard will attempt to render all 10 million coordinates at once.

The primary determinant of the maximum amount of data you can render with lonboard is your computer's hardware. Via the underlying [deck.gl](https://deck.gl/) library, lonboard ultimately renders geometries using your computer's Graphics Processing Unit (GPU). If you have a better GPU card, you'll be able to visualize more data.

Lonboard is more efficient at rendering than previous libraries, but there will always be _some quantity of data_ beyond which your browser tab is likely to crash while attempting to render. Testing on a recent MacBook Pro M2 computer, lonboard has been able to render a few million points with minimal lag.

## Performance Advice

### Use a local Python session

Moving from a remote Python environment to a local Python environment is often impractical, but this change will make it much faster to visualize data, especially over slow internet connections.

### Remove columns before rendering

All columns included in the `GeoDataFrame` will be transferred to the browser for visualization. (In the future, these other columns will be used to display a tooltip when hovering over/clicking on a geometry.)

Especially in the case of a remote Python session, excluding unnecessary attribute columns will make data transfer to the browser faster.

### Use Arrow-based data types in Pandas

As of Pandas 2.0, Pandas supports two backends for data types: either the original numpy-based data types or new data types based on Arrow and the pyarrow library.

The first thing that lonboard does when visualizing data is converting from Pandas to an Arrow representation. Any non-geometry attribute columns will be converted to Arrow, so if you're using Arrow-based data types in Pandas already, this step will be "free" as no conversion is needed.

See the pandas [guide on data types](https://pandas.pydata.org/docs/user_guide/pyarrow.html) and the [`pandas.ArrowDtype` class](https://pandas.pydata.org/docs/reference/api/pandas.ArrowDtype.html).

### Simplify geometries before rendering

Simplifying geometries before rendering reduces the total number of coordinates and can make a visualization snappier. At this point, lonboard does not offer built-in geometry simplification. This is something you would need to do before passing data to lonboard.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ nav:
- ScatterplotLayer: layers/scatterplot-layer.md
- PathLayer: layers/path-layer.md
- SolidPolygonLayer: layers/solid-polygon-layer.md
- Performance: performance.md
- "How it works?": how-it-works.md

theme:
Expand Down

0 comments on commit 43302fe

Please sign in to comment.