-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support ripping with docker in parallel #8
base: master
Are you sure you want to change the base?
Changes from all commits
f419c4a
b3d708d
dea4140
c72acdf
8499113
66a942f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,7 +121,7 @@ def get_dirname_hdf5(sess_name, rec_name): | |
prev_data = get_dirname_hdf5(sn, rn) / 'data' / 'data.h5' | ||
data_files.append(prev_data) | ||
data_files.append(fname_hdf5) | ||
run_suite2p(data_files, dirname_output, mdata) | ||
run_suite2p(data_files, dirname_output, mdata, args.ops_suite2p) | ||
|
||
if args.backup_output: | ||
backup(dirname_output, dirname_backup / 'output') | ||
|
@@ -156,8 +156,7 @@ def preprocess(basename_input, dirname_output, fname_csv, fname_uncorrected, fna | |
stim_channel_name, settle_time): | ||
"""Main method for running processing of TIFF files into HDF5.""" | ||
size = mdata['size'] | ||
|
||
df_voltage = pd.read_csv(fname_csv, index_col='Time(ms)', skipinitialspace=True) | ||
df_voltage = pd.read_csv(fname_csv, skipinitialspace=True) | ||
logger.info('Read voltage recordings from: %s, preview:\n%s', fname_csv, df_voltage.head()) | ||
fname_frame_start = dirname_output / 'frame_start.h5' | ||
frame_start = artefacts.get_frame_start(df_voltage, fname_frame_start) | ||
|
@@ -241,13 +240,19 @@ def run_cmd(cmd, expected_returncode, shell=False): | |
logger.info('Output:\n%s', result.stdout.decode('utf-8')) | ||
|
||
|
||
def run_suite2p(hdf5_list, dirname_output, mdata): | ||
def run_suite2p(hdf5_list, dirname_output, mdata, ops_file=None): | ||
z_planes = mdata['size']['z_planes'] | ||
fs_param = 1. / (mdata['period'] * z_planes) | ||
|
||
# Load suite2p only right before use, as it has a long load time. | ||
import suite2p | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This loads in all of suite2p (which includes the gui stuff and the detection modules). It's not fatal, but I've found suite2p to be heavy weight. So I had just imported the subset of libraries (run_s2p) that we use below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. latest version of suite2p refactored a bit, e.g. need |
||
from suite2p import run_s2p | ||
default_ops = run_s2p.default_ops() | ||
if ops_file is None: | ||
ops = suite2p.default_ops() | ||
else: | ||
# from: | ||
# https://github.com/MouseLand/suite2p/blob/4b6c3a95b53e5581dbab1feb26d67878db866068/suite2p/gui/rungui.py#L472 | ||
ops = np.load(ops_file, allow_pickle=True).item() | ||
params = { | ||
'input_format': 'h5', | ||
'data_path': [str(f.parent) for f in hdf5_list], | ||
|
@@ -264,7 +269,7 @@ def run_suite2p(hdf5_list, dirname_output, mdata): | |
logger.info('Running suite2p on files:\n%s\n%s', '\n'.join(str(f) for f in hdf5_list), params) | ||
with open(dirname_output / 'recording_order.json', 'w') as fout: | ||
json.dump([str(e) for e in hdf5_list], fout, indent=4) | ||
run_s2p.run_s2p(ops=default_ops, db=params) | ||
run_s2p(ops=ops, db=params) | ||
|
||
|
||
def parse_args(): | ||
|
@@ -295,6 +300,8 @@ def parse_args(): | |
help='Top level directory for SLM setup data') | ||
group.add_argument('--output_dir', type=pathlib.Path, help='Top level directory of data processing') | ||
|
||
group.add_argument('--suite2p-ops', type=pathlib.Path, help='.ops file for suite2p', default=None) | ||
|
||
group.add_argument('--recording', | ||
type=str, | ||
help=('Name of a recording, given as a slash separated id of SESSION/RECORDING/OPTIONAL_PREFIX ' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,8 +15,8 @@ | |
# Ripping process does not end cleanly, so the filesystem is polled to detect the | ||
# processing finishing. The following variables relate to the timing of that polling | ||
# process. | ||
RIP_TOTAL_WAIT_SECS = 3600 # Total time to wait for ripping before killing it. | ||
RIP_EXTRA_WAIT_SECS = 10 # Extra time to wait after ripping is detected to be done. | ||
RIP_TOTAL_WAIT_SECS = 60*60*10 # Total time to wait for ripping before killing it. | ||
RIP_EXTRA_WAIT_SECS = 60*60*10 # Extra time to wait after ripping is detected to be done. | ||
RIP_POLL_SECS = 10 # Time to wait between polling the filesystem. | ||
|
||
|
||
|
@@ -35,6 +35,7 @@ def determine_ripper(data_dir, ripper_dir): | |
version = root.attrib['version'] | ||
|
||
# Prairie View versions are given in the form A.B.C.D. | ||
# TODO: allow match when minor version mismatches, eg fall back to `version = '5.5.1.1'` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, this uses the major+minor to determine which ripper version to use. It seems, at least anecdotally, that the minor version is important. What behavior are you expecting on a minor version mismatch? One improvement I see is that we should throw an error right here if the ripper version is not in the repo. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes you're right i think minor version is sometimes (usually) important...annoying... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then this TODO probably isn't necessary. We currently handle 5.4 vs 5.5, and I'd like the code to fail loudly (not silently try something we haven't tested) if it detects anything else. |
||
match = re.match(r'^(?P<majmin>\d+\.\d+)\.\d+\.\d+$', version) | ||
if not match: | ||
raise RippingError('Could not parse version (expected A.B.C.D): %s' % version) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you choose to do this in a shell script, compared to launching multiple jobs (or an array job) via slurm?
The reason I ask is because this seems to be adding some complexity around xvfb and locking (which looks tricky and is just one more thing that could break....), whereas parallel launching via slurm is straightforward. Also, it gets tricky to manage ending gracefully and/or doing retries in case where one or more of the parallel threads fails for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been running this script on my local computer to avoid filesystem slowness, so this is nice for parallelization, but perhaps better to be in new script..?