NWB search demo #85

bendichter · 2024-07-16T18:45:42Z

At NeuroDataReHack, one of the students wanted to identify sessions within the IBL dataset that contained electrodes in a specific brain region. That's doable with the DANDI API, remfile, and pynwb, but can take a very long time because it requires streaming and initializing each NWB file. I think it would be much faster to do it with LINDI, particularly since the metadata they needed was stored in the json file as base64. It would be great if we had a tutorial that demonstrated how to use LINDI in this way. I think it could reduce search time substantially and would be a cool use-case for LINDI.

bendichter · 2024-07-18T13:01:15Z

@magland I'm trying to put together an example of a search over brain regions for electrophysiology. This works, but it takes about 15 minutes for me on the IBL dataset (to be fair, it's a very big dataset). Is there a faster way, or is this what you would recommend?

import lindi
from tqdm import tqdm
from dandi.dandiapi import DandiAPIClient

brain_area = "MB"

dandi_api_client = DandiAPIClient()

dandiset_id = "000409"

dandiset = dandi_api_client.get_dandiset(dandiset_id)

elec_loc_path = 'general/extracellular_ephys/electrodes/location'

passing_assets = []
for asset in tqdm(list(dandiset.get_assets())):
    if not asset.path.endswith("nwb"):
        continue
    #s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
    lindi_url = f'https://lindi.neurosift.org/dandi/dandisets/{dandiset_id}/assets/{asset.identifier}/nwb.lindi.json'

    lindi_file = lindi.LindiH5pyFile.from_lindi_file(lindi_url, local_cache=local_cache)
    if elec_loc_path in lindi_file and brain_area in lindi_file[elec_loc_path]:
        passing_assets.append(asset)

bendichter · 2024-07-18T13:47:52Z

For comparison, this took 18:30

import lindi
from tqdm import tqdm
from dandi.dandiapi import DandiAPIClient
import remfile
import h5py

brain_area = "V3"

dandi_api_client = DandiAPIClient()

dandiset_id = "000409"

dandiset = dandi_api_client.get_dandiset(dandiset_id)

elec_loc_path = 'general/extracellular_ephys/electrodes/location'


passing_assets = []
for asset in tqdm(list(dandiset.get_assets())):
    if not asset.path.endswith("nwb"):
        continue
    s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
    rem_file = remfile.File(s3_url)
    h5_file = h5py.File(rem_file, "r")
    
    if elec_loc_path in h5_file and brain_area in h5_file[elec_loc_path]:
        print(passing_assets)
        passing_assets.append(asset)

Actually, I am surprised there isn't a bigger time difference

magland · 2024-07-19T19:16:24Z

Yeah that is surprising that the lindi method is not faster. The only part that takes any time is the download. Each file is only 1-2 MB. That should take less than a second per file, but it will depend on the network connection.

I tried it on a gh codespaces instance and it took 506 seconds (8:26 minutes) for 677 files (115 passing).

I then tried with the remfile method -- I didn't run it to completion but it was going at around 2.2 seconds per file. So slower, but not hugely.

Where lindi would provide a much bigger advantage, I am speculating, is when you are loading more information per file -- like for example using pynwb where a lot more metadata needs to be loaded.

Another possibility is to prepare a single large lindi file for the entire dandiset. Then it could be loaded much more efficiently.

Yet another possibility is to cache the lindi files locally.

bendichter mentioned this issue Jul 16, 2024

repr error #86

Open

bendichter mentioned this issue Jul 22, 2024

request: extract brain area metadata dandi/dandi-archive#1985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NWB search demo #85

NWB search demo #85

bendichter commented Jul 16, 2024

bendichter commented Jul 18, 2024

bendichter commented Jul 18, 2024 •

edited

Loading

magland commented Jul 19, 2024

NWB search demo #85

NWB search demo #85

Comments

bendichter commented Jul 16, 2024

bendichter commented Jul 18, 2024

bendichter commented Jul 18, 2024 • edited Loading

magland commented Jul 19, 2024

bendichter commented Jul 18, 2024 •

edited

Loading