Skip to content

Commit

Permalink
Merge pull request #17 from catalystneuro/watters
Browse files Browse the repository at this point in the history
Add probe locations, REAME updates, and small cleanups.
  • Loading branch information
nwatters01 authored Dec 21, 2023
2 parents d8536a5 + dcaa0ad commit d1ab151
Show file tree
Hide file tree
Showing 7 changed files with 172 additions and 128 deletions.
105 changes: 70 additions & 35 deletions src/jazayeri_lab_to_nwb/watters/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Watters data conversion pipeline
NWB conversion scripts for Watters data to the [Neurodata Without Borders](https://nwb-overview.readthedocs.io/) data format.

NWB conversion scripts for Nick Watters working memory data to the
[Neurodata Without Borders](https://nwb-overview.readthedocs.io/) data format.

## Usage
To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
To run a specific conversion, you might need to install first some conversion
specific dependencies that are located in each conversion directory:
```
pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
```
Expand All @@ -13,44 +14,78 @@ You can run a specific conversion with the following command:
python src/jazayeri_lab_to_nwb/watters/main_convert_session.py $SUBJECT $SESSION
```

### Watters working memory task data
The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/main_convert_session.py`. The function takes arguments:
where `$SUBJECT` is in `['Perle', 'Elgar']` and `$SESSION` is a session date in
format `'YYYY-MM-DD'`. For example:
```
python src/jazayeri_lab_to_nwb/watters/main_convert_session.py Perle 2022-06-01
```

The conversion function for this experiment, `session_to_nwb`, is found in
`src/watters/main_convert_session.py`. The function takes arguments:
* `subject` subject name, either `'Perle'` or `'Elgar'`.
* `session` session date in format `'YYYY-MM-DD'`.
* `stub_test` indicates whether only a small portion of the data should be saved (mainly used by us for testing purposes).
* `stub_test` indicates whether only a small portion of the data should be
saved (used for testing purposes).
* `overwrite` indicates whether to overwrite nwb output files.
* `dandiset_id` optional dandiset ID.

The function can be imported in a separate script with and run, or you can run the file directly and specify the arguments in the `if name == "__main__"` block at the bottom.

The function expects the raw data in `data_dir_path` to follow this structure:

data_dir_path/
├── data_open_source
│ ├── behavior
│ │ └── eye.h.times.npy, etc.
│ ├── task
│ └── trials.start_times.json, etc.
│ └── probes.metadata.json
├── raw_data
│ ├── spikeglx
│ └── */*/*.ap.bin, */*/*.lf.bin, etc.
│ ├── v_probe_0
│ └── raw_data.dat
│ └── v_probe_{n}
│ └── raw_data.dat
├── spike_sorting_raw
│ ├── np
│ ├── vp_0
│ └── vp_{n}
├── sync_pulses

The function can be imported in a separate script with and run, or you can run
the file directly and specify the arguments in the `if name == "__main__"`
block at the bottom.

## Data format

The function expects there to exist data paths with this structure:
```
trials
├── eye_h_calibrated.json
├── eye_v_calibrated.json
├── pupil_size_r.json
├── reward_line.json
├── sound.json
└── trials.json
data_open_source
└── probes.metadata.json
raw_data
├── spikeglx
└── */*/*.ap.bin, */*/*.lf.bin, etc.
├── v_probe_0
└── raw_data.dat
└── v_probe_{n}
└── raw_data.dat
spike_sorting
├── np
├── v_probe_0
└── v_probe_{n}
sync_pulses
├── mworks
├── open_ephys
└── spikeglx
...
```
Each of the top-level directories may lie in different filesystems. The script
`get_session_paths.py` contains a function to fetch them given subject and session.

The converted data will be saved in two files in
`/om/user/nwatters/nwb_data_multi_prediction/staging/sub-$SUBJECT/`:
sub-$SUBJECT_ses-$SESSION_ecephys.nwb --- Raw physiology
sub-$SUBJECT_ses-$SESSION_behavior+ecephys.nwb --- Task, behavior, and
sorted physiology

If you run into memory issues when writing the `{session_id}_raw.nwb` files,
you may want to set `buffer_gb` to a value smaller than 1 (its default) in the
`conversion_options` dicts for the recording interfaces, i.e.
[here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/main_convert_session.py#L189).

The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.
## Uploading to DANDI

The converted data will be saved in two files, one called `{session_id}_raw.nwb`, which contains the raw electrophysiology data from the Neuropixels and V-Probes, and one called `{session_id}_processed.nwb` with behavioral data, trial info, and sorted unit spiking.
To upload from openmind to DANDI, first log into the openmind data transfer
node, e.g. `ssh [email protected]`. Then navigate to the directory
with the NWB files, e.g.
`/om/user/nwatters/nwb_data_multi_prediction/staging/`. Finally, run the steps
in the
[DANDI uploading pipeline](https://www.dandiarchive.org/handbook/13_upload/#data-uploadmanagement-workflow).

If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/main_convert_session.py#L189).
Note that you must pip install dandi to run the uploading steps, and in order
to activate a conda environment in openmind-dtn you may have to run
`$ source ~/.bashrc` in the openmind-dtn terminal. Also not that DANDI
uploading entire sessions of raw data can take a while, so it is convenient to
run it in a tmux terminal on openmind-dtn.
158 changes: 77 additions & 81 deletions src/jazayeri_lab_to_nwb/watters/main_convert_session.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""Entrypoint to convert an entire session of data to NWB.
This converts a session to NWB format and writes the nwb files to
/om/user/nwatters/nwb_data_multi_prediction/{$SUBJECT}/{$SESSION}
/om/user/nwatters/nwb_data_multi_prediction/staging/sub-$SUBJECT/
Two NWB files are created:
$SUBJECT_$SESSION_raw.nwb --- Raw physiology
$SUBJECT_$SESSION_processed.nwb --- Task, behavior, and sorted physiology
These files can be automatically uploaded to a DANDI dataset.
sub-$SUBJECT_ses-$SESSION_ecephys.nwb --- Raw physiology
sub-$SUBJECT_ses-$SESSION_behavior+ecephys.nwb --- Task, behavior, and
sorted physiology
Usage:
$ python main_convert_session.py $SUBJECT $SESSION
Expand All @@ -17,23 +17,18 @@
_REPO
_STUB_TEST
_OVERWRITE
_DANDISET_ID
See comments below for descriptions of these variables.
"""

import glob
import json
import logging
import os
import sys
from pathlib import Path
from typing import Union
from uuid import uuid4
from zoneinfo import ZoneInfo

import get_session_paths
import nwb_converter
from neuroconv.tools.data_transfers import automatic_dandi_upload
from neuroconv.utils import dict_deep_update, load_dict_from_file

# Data repository. Either 'globus' or 'openmind'
Expand All @@ -42,8 +37,6 @@
_STUB_TEST = True
# Whether to overwrite output nwb files
_OVERWRITE = True
# ID of the dandiset to upload to, or None to not upload
_DANDISET_ID = None # '000620'

# Set logger level for info is displayed in console
logging.getLogger().setLevel(logging.INFO)
Expand All @@ -54,7 +47,7 @@
}
_SUBJECT_TO_AGE = {
"Perle": "P10Y", # Born 6/11/2012
"Elgar": "P10Y", # Born 5/2/2012
"Elgar": "P11Y", # Born 5/2/2012
}


Expand Down Expand Up @@ -141,6 +134,7 @@ def _add_spikeglx_data(
]
if len(spikeglx_dir) == 0:
logging.info("Found no SpikeGLX data")
return
elif len(spikeglx_dir) == 1:
spikeglx_dir = spikeglx_dir[0]
else:
Expand All @@ -167,12 +161,75 @@ def _add_spikeglx_data(
)


def _update_metadata(metadata, subject, session_id, session_paths):
"""Update metadata."""

# Add subject_id, session_id, sex, and age
metadata["NWBFile"]["session_id"] = session_id
metadata["Subject"]["subject_id"] = subject
metadata["Subject"]["sex"] = _SUBJECT_TO_SEX[subject]
metadata["Subject"]["age"] = _SUBJECT_TO_AGE[subject]

# Add probe locations
probe_metadata_file = (
session_paths.data_open_source / "probes.metadata.json"
)
probe_metadata = json.load(open(probe_metadata_file, "r"))
for entry in metadata["Ecephys"]["ElectrodeGroup"]:
if entry["device"] == "Neuropixel-Imec":
neuropixel_metadata = [
x for x in probe_metadata if x["probe_type"] == "Neuropixels"
][0]
coordinate_system = neuropixel_metadata["coordinate_system"]
coordinates = neuropixel_metadata["coordinates"]
depth_from_surface = neuropixel_metadata["depth_from_surface"]
entry["description"] = (
f"{entry['description']}\n"
f"{coordinate_system}\n"
f"coordinates = {coordinates}\n"
f"depth_from_surface = {depth_from_surface}"
)
entry["position"] = [
coordinates[0],
coordinates[1],
depth_from_surface,
]
elif "vprobe" in entry["device"]:
probe_index = int(entry["device"].split("vprobe")[1])
v_probe_metadata = [
x for x in probe_metadata if x["probe_type"] == "V-Probe 64"
][probe_index]
first_channel = v_probe_metadata["coordinates"]["first_channel"]
last_channel = v_probe_metadata["coordinates"]["last_channel"]
coordinate_system = v_probe_metadata["coordinate_system"]
entry["description"] = (
f"{entry['description']}\n"
f"{coordinate_system}\n"
f"first_channel = {first_channel}\n"
f"last_channel = {last_channel}"
)
entry["position"] = first_channel

# Update default metadata with the editable in the corresponding yaml file
editable_metadata_path = Path(__file__).parent / "metadata.yaml"
editable_metadata = load_dict_from_file(editable_metadata_path)
metadata = dict_deep_update(metadata, editable_metadata)

# Ensure session_start_time exists in metadata
if "session_start_time" not in metadata["NWBFile"]:
raise ValueError(
"Session start time was not auto-detected. Please provide it "
"in `metadata.yaml`"
)

return metadata


def session_to_nwb(
subject: str,
session: str,
stub_test: bool = False,
overwrite: bool = True,
dandiset_id: Union[str, None] = None,
):
"""
Convert a single session to an NWB file.
Expand All @@ -189,27 +246,10 @@ def session_to_nwb(
overwrite : boolean
If the file exists already, True will delete and replace with a new file, False will append the contents.
Default is True.
dandiset_id : string, optional
If you want to upload the file to the DANDI archive, specify the six-digit ID here.
Requires the DANDI_API_KEY environment variable to be set.
To set this in your bash terminal in Linux or macOS, run
export DANDI_API_KEY=...
or in Windows
set DANDI_API_KEY=...
Default is None.
"""
if dandiset_id is not None:
import dandi # check importability

assert os.getenv("DANDI_API_KEY"), (
"Unable to find environment variable 'DANDI_API_KEY'. "
"Please retrieve your token from DANDI and set this environment "
"variable."
)

logging.info(f"stub_test = {stub_test}")
logging.info(f"overwrite = {overwrite}")
logging.info(f"dandiset_id = {dandiset_id}")

# Get paths
session_paths = get_session_paths.get_session_paths(
Expand Down Expand Up @@ -288,50 +328,19 @@ def session_to_nwb(
)
processed_conversion_options["Display"] = dict()

# Create processed data converter
# Create data converters
processed_converter = nwb_converter.NWBConverter(
source_data=processed_source_data,
sync_dir=session_paths.sync_pulses,
)

# Add datetime and subject name to processed converter
metadata = processed_converter.get_metadata()
metadata["NWBFile"]["session_id"] = session_id
metadata["Subject"]["subject_id"] = subject
metadata["Subject"]["sex"] = _SUBJECT_TO_SEX[subject]
metadata["Subject"]["age"] = _SUBJECT_TO_AGE[subject]

# EcePhys
probe_metadata_file = (
session_paths.data_open_source / "probes.metadata.json"
raw_converter = nwb_converter.NWBConverter(
source_data=raw_source_data,
sync_dir=str(session_paths.sync_pulses),
)
with open(probe_metadata_file, "r") as f:
probe_metadata = json.load(f)
neuropixel_metadata = [
x for x in probe_metadata if x["probe_type"] == "Neuropixels"
][0]
for entry in metadata["Ecephys"]["ElectrodeGroup"]:
if entry["device"] == "Neuropixel-Imec":
# TODO: uncomment when fixed in pynwb
# entry.update(dict(position=[(
# neuropixel_metadata['coordinates'][0],
# neuropixel_metadata['coordinates'][1],
# neuropixel_metadata['depth_from_surface'],
# )]
logging.info("\n\n")
logging.warning(" PROBE COORDINATES NOT IMPLEMENTED\n\n")

# Update default metadata with the editable in the corresponding yaml file
editable_metadata_path = Path(__file__).parent / "metadata.yaml"
editable_metadata = load_dict_from_file(editable_metadata_path)
metadata = dict_deep_update(metadata, editable_metadata)

# Check if session_start_time was found/set
if "session_start_time" not in metadata["NWBFile"]:
raise ValueError(
"Session start time was not auto-detected. Please provide it "
"in `metadata.yaml`"
)
# Update metadata
metadata = processed_converter.get_metadata()
metadata = _update_metadata(metadata, subject, session_id, session_paths)

# Run conversion
logging.info("Running processed conversion")
Expand All @@ -344,25 +353,13 @@ def session_to_nwb(

logging.info("Running raw data conversion")
metadata["NWBFile"]["identifier"] = str(uuid4())
raw_converter = nwb_converter.NWBConverter(
source_data=raw_source_data,
sync_dir=str(session_paths.sync_pulses),
)
raw_converter.run_conversion(
metadata=metadata,
nwbfile_path=raw_nwb_path,
conversion_options=raw_conversion_options,
overwrite=overwrite,
)

# Upload to DANDI
if dandiset_id is not None:
logging.info(f"Uploading to dandiset id {dandiset_id}")
automatic_dandi_upload(
dandiset_id=dandiset_id,
nwb_folder_path=session_paths.output,
)


if __name__ == "__main__":
"""Run session conversion."""
Expand All @@ -374,6 +371,5 @@ def session_to_nwb(
session=session,
stub_test=_STUB_TEST,
overwrite=_OVERWRITE,
dandiset_id=_DANDISET_ID,
)
logging.info(f"\nFinished conversion for {subject}/{session}\n")
1 change: 1 addition & 0 deletions src/jazayeri_lab_to_nwb/watters/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ NWBFile:
lab: Jazayeri
experimenter:
- Watters, Nicholas
- Gabel, John
Subject:
species: Macaca mulatta
Loading

0 comments on commit d1ab151

Please sign in to comment.