Create manifest creation script. #55

morriscb · 2024-08-05T19:19:20Z

Add function, ported from Jupyter notebook to create manifests given a set of data.

danielsf

Just some cosmetic changes

danielsf · 2024-08-05T22:36:08Z

src/manifest_builder/manifest_builder.py

+        "directory_listing": {}
+    }
+    for directory in os.walk(base_dir):
+        versions = []


Do you mind changing this to something like version_list? The one character difference between version and versions might be confusing (channeling my inner Russell Owen here)

danielsf · 2024-08-05T22:48:39Z

src/manifest_builder/manifest_builder.py

+        Base release manifest dictionary with all directories, links, paths,
+        and files populated.
+    """
+    datasets = {}


Optionally: same comment as before regarding just calling this the plural of dataset. Maybe this could be dataset_lookup. The need is not as acute because datasets and data_set are a little more distinguishable.

Changed to dataset_lookup and dataset.

danielsf · 2024-08-05T22:49:13Z

src/manifest_builder/manifest_builder.py

+            ver_dict = ds_dict[data_kind]
+
+            data_dir = os.path.join(base_dir, ver_dict['relative_path'])
+            # print('---',data_dir)


remove commented-out line

(along with commented print lines below)

danielsf · 2024-08-05T22:54:59Z

src/manifest_builder/manifest_builder.py

+    return release
+
+
+def populate_paths_and_urls(


Can you give this function a name that more clearly indicates that this is for populating information at the level of directories/datasets, rather than individual files? I got a little tripped-up by the double implementation of bucket_prefix + relative_path between this function and populate_datasets. It took me a moment to realize this function was operating at a different level than the other.

Changed to populate_directories

danielsf · 2024-08-05T22:58:37Z

src/manifest_builder/manifest_builder.py

+                        datasets[data_set][data_kind][tag][norm]['files'] = {}
+                        datasets[data_set][data_kind][tag][norm]['files'][
+                            ext] = {}
+                        datasets[data_set][data_kind][tag][norm]['files'][


Rather than individually set 'version', 'url', 'file_hash' etc. in each section of this if/else block, can you define a file_stats dict that has each of these elements before entering the if/else block, and then the place file_stats under whatever chain of keys to need to based on the extension?

Nice catch.

danielsf · 2024-08-05T23:02:09Z

src/manifest_builder/manifest_builder.py

+from abc_atlas_access.abc_atlas_cache.utils import file_hash_from_path
+
+
+BUCKET_PREFIX = 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/'


These global variables do not appear to be used

danielsf

Just one comment on a docstring. This is good to merge, though.

danielsf · 2024-08-12T15:27:17Z

src/manifest_builder/manifest_builder.py

@@ -280,7 +280,7 @@ def populate_datasets(
        "--datasets_to_skip",
        nargs='+',
        type=str,
-        default=['releases', 'SEAAD', 'Zhuang-C57BL6J'],
+        default=['releases', 'SEA', 'Zhuang-C57BL6J'],
        help="Skip a given project for all directories that start with the "
             "given pattern. (e.g. SEAD will exclude all directories that "


Should the example in the docstring be SEAAD (two As), since that is what we call the actual dataset?

No. Pretty sure there are also directories with SEA-AD and we also want to skip those.

morriscb · 2024-08-13T21:08:20Z

Running a check of the manifest (downloading and verifying hashes). Once that's done I'll merge the changes unless there is anything else you'd like addressed.

Add manifest builder function. Add doc strings. Add json and h5 file manifest creation. Hide mapmycells for now.

morriscb changed the base branch from main to rc/0.2.0 August 5, 2024 19:19

morriscb requested a review from danielsf August 5, 2024 19:19

danielsf requested changes Aug 5, 2024

View reviewed changes

morriscb requested a review from danielsf August 9, 2024 17:19

danielsf approved these changes Aug 12, 2024

View reviewed changes

Add manifest creator code.

a7c5369

Add manifest builder function. Add doc strings. Add json and h5 file manifest creation. Hide mapmycells for now.

morriscb force-pushed the u/morriscb/autoManifest branch from b7b3131 to a7c5369 Compare August 15, 2024 16:31

morriscb merged commit 455d65f into rc/0.2.0 Aug 15, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create manifest creation script. #55

Create manifest creation script. #55

morriscb commented Aug 5, 2024

danielsf left a comment

danielsf Aug 5, 2024

danielsf Aug 5, 2024

morriscb Aug 9, 2024

danielsf Aug 5, 2024

danielsf Aug 5, 2024

morriscb Aug 9, 2024

danielsf Aug 5, 2024

morriscb Aug 9, 2024

danielsf Aug 5, 2024

danielsf left a comment

danielsf Aug 12, 2024

morriscb Aug 12, 2024

morriscb commented Aug 13, 2024

		from abc_atlas_access.abc_atlas_cache.utils import file_hash_from_path


		BUCKET_PREFIX = 'https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/'

Create manifest creation script. #55

Create manifest creation script. #55

Conversation

morriscb commented Aug 5, 2024

danielsf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielsf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morriscb commented Aug 13, 2024