Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading images for a particular plate #74

Open
wjn0 opened this issue Feb 3, 2025 · 3 comments
Open

Downloading images for a particular plate #74

wjn0 opened this issue Feb 3, 2025 · 3 comments
Assignees

Comments

@wjn0
Copy link

wjn0 commented Feb 3, 2025

I have a plate ID (BR00116991) and I'd like to download the images for a particular plate. Is this possible with jump_portrait or one of the other tools in this repo?

Many thanks!

@afermg afermg assigned afermg and unassigned afermg Feb 3, 2025
@afermg
Copy link
Collaborator

afermg commented Feb 3, 2025

This plate is part of the JUMP pilot experiments (cpg0001), which does not follow the same structure as "main" JUMP (cpg0016),portrait does not yet support other cell painting gallery prefixes besides cpg0016. That said, you can find all the images (but not metadata) here and instructions on how to download it using aws's s3 CLI here. You just need to replace the {batch} section with the correct url suffix.

@afermg
Copy link
Collaborator

afermg commented Feb 3, 2025

Here is a way to get all the filenames in with duckdb, so you can pipe them for download into another tool like wget (no python needed). Do be aware that each image is a channel, so it is a non-negligible volume of data.

❯ duckdb -csv -c 'UNPIVOT (SELECT COLUMNS("URL_*Orig*") FROM "https://cellpainting-gallery.s3.amazonaws.com/cpg0000-jump-pilot/source_4/workspace/load_data_csv/2020_11_04_CPJUMP1/BR00116991/load_data_with_illum.csv") ON *;' | cut -f2 -d',' | head

value
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch5sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch7sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch6sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch8sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch1sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch2sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch4sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f01p01-ch3sk1fk1fl1.tiff
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/BR00116991__2020-11-05T19_51_35-Measurement1/Images/r01c01f02p01-ch5sk1fk1fl1.tiff

@afermg
Copy link
Collaborator

afermg commented Feb 4, 2025

In an unrelated refactoring we also wrote an example of how to automate downloading images. See the following python code (you will need jump_portrait>=0.26 - to be published soon). I added a break that will make it stop once the 9 sites of the first well (A01) are loaded onto memory.

import polars as pl
from jump_portrait.fetch import get_jump_image

tmp='https://cellpainting-gallery.s3.amazonaws.com/cpg0000-jump-pilot/source_4/workspace/load_data_csv/2020_11_04_CPJUMP1/BR00116991/load_data.csv'
dataset = tmp.split("/")[3]
source = tmp.split("/")[4]
batch = tmp.split("/")[7]

ldc = pl.read_csv(tmp)
channel = "DNA"
correction = None # or "Illum"

imgs = {}
for plate, well, site in ldc.select(pl.col(f"Metadata_{x}" for x in ("Plate", "Well", "Site"))).rows():
    if well == "A02":
        break
    imgs[(plate, well, site)] = get_jump_image(source, batch, plate, well, channel, site, correction, dataset=dataset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants