-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] for "Condition" arrays #418
Comments
Could also be an interesting read for you @roaldarbol, though some aspects of this discussion are very xarray-centric. |
I think 3 is definitely do-able, though the function call signature might end up a bit more complicated. But I do echo the possible overkill concerns. To me, if a user is willing to do through the trouble of writing their own condition_array = condition_function(data) That said, I agree that we should still have some wrappers for the most common conditions we expect. Furthermore, having the ability to automatically combine them into one new "condition array" would be nice and does seem like a natural goal we want to reach eventually. Plus, running multiple conditions over a single (or even multiple) compute_condition_array(
data: Sequence[xr.DataArray],
conditions: Sequence[Callable[[...], xr.DataArray],
) -> xr.DataArray
# returns a DataArray with 2 extra dimensions; 'data' and 'condition',
# whose lengths are equal to the lengths of the data and condition arguments respectively.
# We can provide coordinates for the two new axes by using the individual names of the data / conditions,
# and potentially even store (pointers to them) in the returned DataArray.attrs. The final function signature of this method would likely be a bit more complex though (since we might need to pass keyword arguments to the Maybe a starting point is a |
You are absolutely right about that! Writing the condition function is the hard bit, calling that on data to get a boolean array is trivial.
It is definitely cool, but I wouldn't start with that.
Yes, we should absolutely start with the |
Under polygon.compute_region_occupancy(data, other_regions) doesn't make that much sense (why is this a property of one of the regions we care about?) compared to compute_region_occupancy(data, [polygon1, polygon2, ...]) I guess we could do class PolygonOfInterest:
@staticmethod
def compute_region_occupancy(data, regions) -> None:
... but again, IMO a standalone function is cleaner. |
Yes, I meant as a standalone function, not inside the class. Perhaps put it in |
I've made a start on this over on #421. Just needs some tests added, but I'm teaching the next couple of days. @stellaprins might be able to pick this up as she's back with us Thursday |
@lochhh found this xarray-events package that could be useful here
|
|
We propose introducing a new type of data array in
movement
for storing boolean outcomes of various conditions. This idea evolved from discussions between me, @sfmig and @stellaprins about region-of-interest (RoI) occupancy but can be generalized to other use cases such as ethograms or social interaction detection.1. Region-of-Interest (RoI) Occupancy
Consider a typical poses dataset
ds
with:N
time pointsx, y
)The position array
ds.position
has shape(N, 2, 6, 3)
.We have two example RoIs:
Using PR #413, we can check if a given point is within each polygon at each time point:
Each result is a boolean array with shape
(N, 6, 3)
, effectively collapsing the spatial dimension. If we want multiple RoIs in a single structure, we could introduce a new dimension calledconditions
, resulting in an array of shape(time, keypoints, individuals, conditions)
—for instance,(N, 6, 3, 2)
if we have two polygons (nest
andfeeder
).A helper function might look like this:
From such an occupancy array, we can:
keypoints
dimension based on rules (e.g., all keypoints inside an RoI).nest
tofeeder
).nest
tofeeder
that are shorter than 20 seconds and highlight them in red in a trajectory plot).2. General Utility of Boolean "Condition" Arrays
This concept extends beyond RoI occupancy. Any scenario where we want to track a set of boolean conditions over time (and possibly across individuals or keypoints) can benefit from such arrays.
2.1 Ethograms
An ethogram shows when an individual exhibits specific behaviors. A boolean array
(time, individuals, conditions)
could indicate whether each individual is performing behaviors likegrooming
,feeding
, orsleeping
. This structure simplifies identifying onsets, durations, and transitions between behaviors.2.2 Social Interactions
See issue #225. For instance, detecting snout-to-snout or snout-to-tail interactions between two mice can be framed as checking whether specific distance-based conditions are fulfilled at each time point. Storing these results in a
(time, conditions)
boolean array makes it straightforward to analyze interaction onsets and offsets.2.3 Edge Case: Collapsing
time
In some cases, the
time
dimension might be collapsed. For example, to determine whether a region was always in an animal's field of view throughout an entire session, we might reduce(time, individuals, conditions)
to(individuals, conditions)
by requiring all time points to beTrue
.Another example of a time-less array would be one that answers the question "which indivudals stayed in box A for >= 10 seconds".
We can also label such time-less arrays as "Condition" arrays, but this is mostly a semantic choice.
2.4 Benefits of a unified "Conditions" framework
Framing all the above problems as "Condition" array problems allows us to write general methods for post-processing them. For example: counting the entries to a region is conceptually (and computationally) identical to counting onsets of a specific behaviour in an ethogram; time spent in a given RoI is the same as time spent engaging in a specific type of social interaction; etc.
If we enforce a consistent dimension name, i.e.
conditions
(open to alternative names), our methods can do something meaningful when they encounter boolean arrays with that dimension.3. Could We Have a General Function?
A specialized function like
compute_region_occupancy
works for RoIs, but we could consider a more general approach:The
data
here could beposition
,velocity
on any other variable that makes sense.The callable
condition
function would implement custom logic, potentially collapsing (broadcasting over) different dimensions. This could even make good use of our recently acquired "broadcasting" decorators.While this general approach might be powerful, it could also be overkill. We might be better off starting with specific functions (e.g., region occupancy) and later explore a more generalized approach.
Feedback Requested
We’d appreciate input from @willGraham01 and others on the design and feasibility of introducing these boolean condition arrays into
movement
.Thank you for reading and for any insights or suggestions!
The text was updated successfully, but these errors were encountered: