[Bug]: Cannot use parallel for checks #436

tuanpham96 · 2024-02-15T00:35:15Z

What happened?

I've been having issues with running the inspector in parallel.

Doing this would be fine

nwbinspector data/nwb --config dandi --report-file-path inspector-results.txt

But using --n-jobs would tend to end in error

nwbinspector data/nwb --config dandi --report-file-path inspector-results.txt --n-jobs 8

The traceback is below:

concurrent.futures.process._RemoteTraceback:                                                                                
"""
Traceback (most recent call last):
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'register_check.<locals>.register_check_and_auto_parse.<locals>.auto_parse_some_output'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/lab/anaconda/nwb/bin/nwbinspector", line 8, in <module>
    sys.exit(inspect_all_cli())
             ^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/nwbinspector/nwbinspector.py", line 280, in inspect_all_cli
    messages = list(
               ^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/nwbinspector/nwbinspector.py", line 454, in inspect_all
    for message in future.result():
                   ^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'register_check.<locals>.register_check_and_auto_parse.<locals>.auto_parse_some_output'

Operating System

Linux (not sure the exact OS, running on university cluster).

Python Version

Python 3.11.4

Were you streaming with ROS3?

No

Package Versions

hdmf                          3.11.0
numpy                         1.26.2
nwbinspector                  0.4.33
pandas                        2.1.3
pynwb                         2.5.0

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this bug was not already reported?
To the best of your ability, have you ensured this is a bug within the code that checks the NWBFile, rather than a bug in the NWBFile reader (e.g., PyNWB or MatNWB)?

The text was updated successfully, but these errors were encountered:

CodyCBakerPhD · 2024-02-15T03:48:01Z

Quite strange - we have tests for parallel usage in the API (https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/tests/test_inspector.py#L212); and technically those leverage the same configuration wrapper via select mode rather than dandi

Tried adding a new API test for this call in #437, and I suppose I should add a CLI invocation and report saving just for good measure

Are you able by change to reproduce this in any way minimally? Would you be willing to share the NWB files in question or if on DANDI, point me towards the dandiset? Also can you share the results of the non-parallel report? (Wondering if it might be some specific check that in particular causes this, and that check is not in one of the base files we use for testing)

CodyCBakerPhD · 2024-02-15T04:06:00Z

Interesting; the CLI invocation reproduces it! https://github.com/NeurodataWithoutBorders/nwbinspector/actions/runs/7910884792/job/21594182388?pr=437

But the API usage doesn't... hmm

Will dig into this

CodyCBakerPhD · 2024-02-15T04:07:08Z

Though even more curiously seems to only affect linux and mac 😅

tuanpham96 · 2024-02-15T17:05:55Z

I tried to reproduce with simple test nwb files. Apparently the --config dandi is the cause.

import numpy as np
from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, TimeSeries, NWBHDF5IO

num_files = 10

for i in range(num_files):
    start_time = datetime(2024, 1, 2, 3, 4, 5, tzinfo=tzlocal())
    name = f"test_{i:03d}"
    
    nwbfile = NWBFile(
        session_description=name,
        identifier=name,
        session_start_time=start_time,
    )

    ts = TimeSeries(
        name=f"ts{i}",
        data=np.random.randn(10, 200),
        unit="-",
        timestamps=np.arange(10),
    )
    nwbfile.add_acquisition(ts)

    with NWBHDF5IO(name + '.nwb', "w") as io:
        io.write(nwbfile)

Running with this is fine:

 nwbinspector . --n-jobs 4

But this would throw the pretty much the same error:

nwbinspector . --config dandi --n-jobs 4

Browsing the code, my guess is it has to do something with how the functions in available_checks are copied when a config is passed, but couldn't find a workaround.

tuanpham96 · 2024-02-15T17:26:50Z

Apparently, not using copy_check would be fine, although I'm not sure whether that is valid.

If I change this line:

nwbinspector/src/nwbinspector/nwbinspector.py

Line 160 in 3b91993

mapped_check = copy_check(check=check)

to just

 mapped_check = check

things would work fine.

CodyCBakerPhD · 2024-02-15T17:48:43Z

But is the full testing suite OK with that change?

CodyCBakerPhD · 2024-02-15T17:49:04Z

BTW I'm having trouble reproducing this locally on my Windows and Linux systems, hence a bit of delay on getting to this

tuanpham96 · 2024-02-15T17:56:05Z

But is the full testing suite OK with that change?

I haven't run the tests yet. I'll try to run them later today or tomorrow on my local machine.

tuanpham96 · 2024-02-15T21:22:30Z

I tried running the tests on your PR branch locally on my machine (Linux) without copy_check, and I have weird results with test_inspector.py

Testing on that file would have the *parallel* tests pass, but fail on *importance_threshold* tests

pytest -v tests/test_inspector.py                                       
================================================================ test session starts =================================================================
platform linux -- Python 3.10.10, pytest-8.0.0, pluggy-1.4.0 -- 
cachedir: .pytest_cache
plugins: cov-4.1.0
collected 20 items                                                                                                                                   

tests/test_inspector.py::TestInspector::test_command_line_on_directory_matches_file PASSED                                                     [  5%]
tests/test_inspector.py::TestInspector::test_command_line_organizes_levels PASSED                                                              [ 10%]
tests/test_inspector.py::TestInspector::test_command_line_runs_cli_only PASSED                                                                 [ 15%]
tests/test_inspector.py::TestInspector::test_command_line_runs_cli_only_parallel PASSED                                                        [ 20%]
tests/test_inspector.py::TestInspector::test_command_line_runs_saves_json PASSED                                                               [ 25%]
tests/test_inspector.py::TestInspector::test_command_line_saves_report PASSED                                                                  [ 30%]
tests/test_inspector.py::TestInspector::test_inspect_all PASSED                                                                                [ 35%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile PASSED                                                                            [ 40%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_dandi_config PASSED                                                               [ 45%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_importance FAILED                                         [ 50%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_string FAILED                                             [ 55%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_manual_iteration PASSED                                                           [ 60%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_manual_iteration_stop PASSED                                                      [ 65%]
tests/test_inspector.py::TestInspector::test_iterable_check_function PASSED                                                                    [ 70%]
tests/test_inspector.py::TestDANDIConfig::test_command_line_runs_parallel_with_config_and_report PASSED                                        [ 75%]
tests/test_inspector.py::TestDANDIConfig::test_inspect_nwbfile_dandi_config_critical_only_entire_registry PASSED                               [ 80%]
tests/test_inspector.py::TestDANDIConfig::test_inspect_nwbfile_dandi_config_violation_and_above_entire_registry PASSED                         [ 85%]
tests/test_inspector.py::TestCheckUniqueIdentifiersPass::test_check_unique_identifiers_pass PASSED                                             [ 90%]
tests/test_inspector.py::TestCheckUniqueIdentifiersFail::test_check_unique_identifiers_fail PASSED                                             [ 95%]
tests/test_inspector.py::test_dandi_config_in_vitro_injection PASSED                                                                           [100%]

But then re-doing with only the failed tests seem to pass.

pytest -v tests/test_inspector.py -k test_inspect_nwbfile_importance_threshold                                                                    
================================================================ test session starts =================================================================
platform linux -- Python 3.10.10, pytest-8.0.0, pluggy-1.4.0 -- /home/penguinsfly/Documents/work/fleischmann-lab-brown/general/dev-tools/contrib/nwbin
spector/.venv/bin/python                                                                                                                              
cachedir: .pytest_cache                                                                                                                               
rootdir: /home/penguinsfly/Documents/work/fleischmann-lab-brown/general/dev-tools/contrib/nwbinspector                                                
plugins: cov-4.1.0                                                                                                                                    
collected 20 items / 18 deselected / 2 selected                                                                                                       
                                                                                                                                                      
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_importance PASSED                                         [ 50%] 
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_string PASSED                                             [100%]

Do you know why?

CodyCBakerPhD · 2024-02-15T21:26:59Z

Yeah, that's the reason the copy function exists in the first place. It's kind of messy to go into detail

Will try to think of broader workaround. For now I recommend running either without parallel and with config (slowest but most accurate), or without config and with parallel (recommended for very large number of files)

Note that the DANDI config just elevates certain subject related items to a high enough level to prevent validation - so you could theoretically just cross-reference the output against this file

tuanpham96 · 2024-02-15T21:32:20Z

Gotcha. Thanks for the suggestions!

tuanpham96 added the category: bug errors in the code or code behavior label Feb 15, 2024

CodyCBakerPhD mentioned this issue Feb 15, 2024

Add test case for DANDI config with parallel #437

Closed

stephprince added the priority: medium non-critical problem and/or affecting only a small set of NWB users label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cannot use parallel for checks #436

[Bug]: Cannot use parallel for checks #436

tuanpham96 commented Feb 15, 2024 •

edited

Loading

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024 •

edited

Loading

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024 •

edited

Loading

tuanpham96 commented Feb 15, 2024

[Bug]: Cannot use parallel for checks #436

[Bug]: Cannot use parallel for checks #436

Comments

tuanpham96 commented Feb 15, 2024 • edited Loading

What happened?

Operating System

Python Version

Were you streaming with ROS3?

Package Versions

Code of Conduct

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024 • edited Loading

CodyCBakerPhD commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024

CodyCBakerPhD commented Feb 15, 2024 • edited Loading

tuanpham96 commented Feb 15, 2024

tuanpham96 commented Feb 15, 2024 •

edited

Loading

tuanpham96 commented Feb 15, 2024 •

edited

Loading

CodyCBakerPhD commented Feb 15, 2024 •

edited

Loading