Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convenience functions for icephys in schema 2.4 #316

Open
lvsltz opened this issue Oct 8, 2021 · 6 comments
Open

convenience functions for icephys in schema 2.4 #316

lvsltz opened this issue Oct 8, 2021 · 6 comments
Labels
category: proposal discussion of proposed enhancements or new features status: todo something needs to be done topic: matnwb-api related to improving the matnwb api

Comments

@lvsltz
Copy link

lvsltz commented Oct 8, 2021

Is there any plan to include in matnwb some convenience functions for the new icephys tables introduced by nwb-schema 2.4, such as those mentioned in the pynwb tutorial (e.g. get_icephys_meta_parent_table or to_hierarchical_dataframe)?

@bendichter
Copy link
Contributor

@lvsltz good question. In general, MatNWB does less type-specific customization, but we may be able to address particular pain points. to_hierarchical_dataframe looks interesting to me because it would be on the DynamicTable level, and that class already has some custom functionality, e.g. getRow. However, I don't know if MATLAB can easily represent hierarchical tables. We can look into that.

@lvsltz
Copy link
Author

lvsltz commented Oct 8, 2021

Thanks @bendichter. I also don't know much about matlab Tables, but even a flat representation (to_flat_table?) would be useful for querying.
I am currently using my custom-written and possibly unoptimized function to prepare the tables. If it's of any use I can reshare it.

@lawrence-mbf
Copy link
Collaborator

What do you mean by a flat table? That might be equivalent to using getRow to retrieve all rows as that returns a MATLAB table object.

@oruebel
Copy link
Contributor

oruebel commented Oct 8, 2021

What do you mean by a flat table?

In the icephys case, we have a hierarchy of tables where each table in the hierarchy represents a different phase in the experiment. E.g., the IntracellularRecordingsTable describes individual recordings and the SimultaneousRecordingsTable then groups those recordings to identify which recordings where performed simultaneously and stores metadata that is common across simultaneous recordings. Each table in this hierarchy then contains a DynamicTableRegion column that selects corresponding rows in the next table, e.g., here the SimultaneousRecordingsTable has a DynamicTableRegion column that selects the rows in the IntracellularRecordingsTable that have been recorded simultaneously. This organization allows us to associate metadata with the different phases of the experiment and helps avoid duplication of metadata.

For analysis, however, it is often convenient to "join" all the tables and just create single "flat" table with all the data from all the tables in one place. I.e., recursively resolve the DynamicTableRegion columns and add the columns from the target table to the source table so that at the end you have one table with all data and no references to other tables. An additional complication then is, that the DynamicTableRegion columns are often also indexed via a VectorIndex so that each row in the source table selects one or more rows in the target table. As such, we also need to replicate rows when joining tables in addition to adding columns.

While this sort of thing is currently mainly used for the icephys tables, it is really a generic operation on DynamicTable and DynamicTableRegion. The following tutorial in PyNWB gives an overview of how this currently looks in PyNWB https://pynwb.readthedocs.io/en/stable/tutorials/domain/plot_icephys_pandas.html#query-intracellular-electrophysiology-metadata. Flattening the tables is done via the following function in HDMF https://github.com/hdmf-dev/hdmf/blob/57211c94a5b89f24dd988b8c8683184ae3e7409d/src/hdmf/common/hierarchicaltable.py#L12-L140 I have to admit, while flattening tables by resolving DynamicTableRegion columns sounds simple enough, doing all the joins and resolving all the links recursively can get quite tricky.

With some hindsight from having implemented this in HDMF, I think to simplify the implementation, it may be easier to instead of trying to resolve all DynmicTableRegion columns, to focus on implementing the join of one column. It would then be up to the user to do the recursion to do all the joins they want to do, but I think it would keep the code simpler, more flexible, and makes things more explicit. I.e., the steps a user would need to take would look something like:

  1. convert the DynamicTable to a Matlab table,
  2. call the join_column function to resolve one DynamicTableRegion column on the Matlab table (but do not recurse),
  3. if step 2 added a new DynamicTableRegion column then it is up to the user to call the join_column function again if they choose to do so.

In this way the join_column function would not need to worry about finding which column to resolve, recursing over tables, and dealing with many tables at the same time. I think the join_column function in this case would probably need the following inputs:

  1. The source Matlab table that contains the DynamicTableRegion column to resolve
  2. The name o the DynamicTableRegion column to do the join on
  3. The target DynamicTable the DynamicTableRegion points to

As a result, the join_column function would return again a Matlab table. If the target table itself contained aDynamicTableRegion column, then it would be up to the user to call join_column again.

This approach would admittedly add some burden on the user, as they would need to explicitly define which columns to resolve and call join_column manually for each column that needs to be resolved, but I think it would help to keep the code simpler, remove a bunch of edge cases, and make the whole process much more explicit and easier to control. For the icephys tables, the columns that need to be resolved are know from the schema, so it should be easy enough then to write a small function (either in MatNWB or the documentation) that would call the join_column function repeatedly to create the flat table for that case.

@lawrence-mbf lawrence-mbf added the category: proposal discussion of proposed enhancements or new features label Oct 12, 2021
@ehennestad ehennestad added status: todo something needs to be done topic: matnwb-api related to improving the matnwb api labels Oct 31, 2024
@ehennestad
Copy link
Collaborator

MATLAB supports nested tables it should be doable to implement something like to_hierarchical_dataframe in python

@ehennestad
Copy link
Collaborator

Potential overlap with #496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal discussion of proposed enhancements or new features status: todo something needs to be done topic: matnwb-api related to improving the matnwb api
Projects
None yet
Development

No branches or pull requests

5 participants