Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TripsLayer for animating moving objects and connect to MovingPandas #292

Merged
merged 50 commits into from
Oct 7, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Dec 5, 2023

a minimal example
Screen Recording 2023-12-05 at 5 15 35 PM

Change list

  • Add new TripsLayer under the experimental module
  • Add dev dependency on movingpandas
  • Add from_movingpandas class method to construct a TripsLayer from a movingpandas TrajectoryCollection

Todo

  • Implement conversion from movingpandas.TrajectoryCollection to GeoArrow.
  • More input validation in TimestampAccessor. Validate timestamps have the same offsetting as the geometry in the main data frame. (Done in 37a64c6)
  • Re-implement this example: https://movingpandas.github.io/movingpandas-website/2-analysis-examples/ship-data.html
  • Store a time_offset integer on the TripsLayer that represents the minimum value of the trip data. Note that you'd need to recompute this when a new get_timestamps is assigned onto the layer. (Done in 6addb2e)
  • Implement custom serialization for the timestamp accessor. Subtract off the time offset when serializing the data, and cast to float32. (Done in b346810)
  • Add timezone parameter? (we infer the timezone from the input data)

Open questions

  • How to handle offsetted timestamps? deck.gl stores timestamps as float32, which means there isn't enough integer precision to store milliseconds or nanoseconds since epoch. (Timestamp precision handling done in ca9dcd1)
  • Where to handle animation? It looks like syncing animation via an ipywidgets.Play widget (connected via jslink) is probably good enough for now, even if it appears to have a decent amount of overhead. The alternative would be to have a manual animation component on the JS side that maintains its own time state.

Example repro

I got data from Access AIS, with a custom bounding box and time range, though it would probably be straightforward to use other data files as well.

import pyarrow as pa
import pandas as pd
import movingpandas as mpd
from lonboard import Map
from lonboard.experimental import TripsLayer
import ipywidgets

path = '/Users/kyle/Downloads/AIS_170180417406763049_2306-1701804175229.csv'
df = pd.read_csv(path)
traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='BaseDateTime', x='LON', y='LAT')

layer.width_min_pixels = 5
layer.trail_length = 100000

play = ipywidgets.Play(
    value=0,
    min=0,
    max=86399000,
    step=50_000,
    interval=50,
    repeat=True
)
play
ipywidgets.jsdlink(
    (play, 'value'),
    (layer, 'current_time'),
)

cc @anitagraser, you may be interested in this, and/or have ideas for how to better integrate with movingpandas

@anitagraser
Copy link

Thanks for tagging me. This development looks really exciting. Let me know if you have any movingpandas questions.

@kdpenner
Copy link

kdpenner commented Jul 12, 2024

Howdy, we talked at SciPy, posting to track this PR’s progress 🙂

@kylebarron kylebarron marked this pull request as ready for review September 19, 2024 10:11

for field_idx in range(batch.num_columns):
field = batch.schema.field(field_idx)
new_field = field.with_type(DataType.list(field))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on arro3's list(..) to convert the pyarrow field through the arrow pycapsule interface, and then pyarrow's with_type() convert again in the other direction?
Nice ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, I see that you converted the pyarrow Table to am arro3 one (attr_table = Table.from_arrow(attr_table)) before calling this function ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here batch is actually an arro3.RecordBatch, the conversion back to arro3 happened here:

attr_table = Table.from_arrow(attr_table)

But yes, these lines should work with pyarrow input as well!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, I see that you converted the pyarrow Table to am arro3 one (attr_table = Table.from_arrow(attr_table)) before calling this function ;)

Right. I could've left it as a pyarrow table; the main difference is mostly for type hinting. It's nice to get IDE completions and type checking and pyarrow doesn't have that yet.

@kylebarron kylebarron added this to the 0.10 milestone Sep 24, 2024
@kylebarron
Copy link
Member Author

I'm not sure on the best API regarding current_time. The last commit created a current_time_as_datetime function to convert from integer back to a datetime object. But really current_time, despite being a public deck.gl API, is an internal construct for the Lonboard TripsLayer. So we should probably make current_time private as _current_time. And since we manage the animation details in animate(), that should be fine. Then should there be a current_time method (getter?) that does the conversion from _current_time as int to a datetime object?

We should also check how datetime works with time zones.

@kylebarron
Copy link
Member Author

Got an air traffic control example working too:

Screen.Recording.2024-10-04.at.5.11.10.PM.mov
Screen.Recording.2024-10-04.at.5.21.06.PM.mov

On Monday we can just clean up the examples a bit, add a display for the current time of the animation, and then publish the new version!

@kdpenner
Copy link

kdpenner commented Oct 5, 2024

I'm able to view ~4 million points on my laptop with reduced performance. 800K look great. if you are looking for more examples, here are open data affiliated with our project: https://osf.io/dg6t3/

the trajectories_* subfolders have (unfortunately only) 1 timestamp every 5 minutes.

for the wish list: will be great when the from_geopandas inherited class method works with native datetime or pandas timestamp objects 🙂

cc @hengoren



class TripsLayer(BaseArrowLayer):
"""The `TripsLayer` renders animated paths that represent vehicle trips.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just vehicles!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I just copy that docstring from deck.gl (https://deck.gl/docs/api-reference/geo-layers/trips-layer), but in this case maybe we should choose a different description

@kylebarron
Copy link
Member Author

@kdpenner if you'd like to create an example notebook in your own repo, we can link to it from our docs

@kylebarron
Copy link
Member Author

kylebarron commented Oct 7, 2024

for the wish list: will be great when the from_geopandas inherited class method works with native datetime or pandas timestamp objects 🙂

It is technically possible, though difficult, to use from_geopandas or from_duckdb with the TripsLayer, but you have to pass in the get_timestamps parameter separately, and ensure the list sizes match the LineString geometries.

The only way for it to work with native datetime/pandas objects would be to have a nested list inside a pandas column, and for now that's a task for users to convert it to an arrow array.

@kdpenner
Copy link

kdpenner commented Oct 7, 2024

yeah. would the geodataframe need to be grouped by a unique identifier? for example, if one gdf has 1000 trajectories in it, such that each timestamp has 1000 duplicates, would I need to iterate through gdf.groupby("agent_id")?

movingpandas groups internally, I think, so that each trajectory is unique to an agent

oh I see you specified LineString, rather than a gdf of Points

@kdpenner
Copy link

kdpenner commented Oct 7, 2024

FWIW layer.get_timestamps.to_numpy() raises a NotImplementedError:

NotImplementedError: Unsupported type in to_numpy List(Field { name: "", data_type: Timestamp(Second, None), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })

to_pylist() works

@kylebarron
Copy link
Member Author

kylebarron commented Oct 7, 2024

FWIW layer.get_timestamps.to_numpy() raises a NotImplementedError:

Yes, because it's a variable-size list and numpy doesn't have variable-size lists. (I suppose we should have a clearer error there)

You can flatten the list and then convert the underlying array to numpy, e.g. with pyarrow.array(layer.get_timestamps).values.to_numpy()

@kylebarron kylebarron enabled auto-merge (squash) October 7, 2024 19:04
@kylebarron kylebarron merged commit 25f7e13 into main Oct 7, 2024
5 checks passed
@kylebarron kylebarron deleted the kyle/trips-layer branch October 7, 2024 19:05
@kylebarron kylebarron mentioned this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants