Skip to content

SchapiroLabor/macsima_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MACSIMA pipeline for hpc usage

This repo describes how to use [MCMICRO] (https://mcmicro.org/) to process multiplexed-images generated with the MACSima platform. The instructions below focus on the execution of the pipeline in the hpc and slurm as job scheduler. The repo contains the necessary bash scripts (sh) and config files to execute the pipeline.

This repo has been created as a temporary solution to process MACSima data with the current MCMICRO version. A more usage convenient solution is on the works as part of the upcoming nf-core/mcmicro.

Usage instructions:

Two steps are to be implemented. In the first one the raw tiles are staged so they can be directly used as input for ASHLAR, which is the registration and stitching algorithm used by MCMICRO.

What the staging step does is reorder the raw tiles of a cycle, all tiles of a cycle that are acquired with a common rack, roi,well and exposure level will be written in the single file with their corresponding ome metadata.

The second step is simply the execution of MCMICRO with a specific set of parameters.

  1. Staging

    1. Download the container (v1.1.0) of macsima2mc with the following command:
    singularity pull docker://ghcr.io/schapirolabor/multiplex_macsima:v1.1.0 
    
    1. Create a tab separated sample array file (e.g. acquisitions.tsv) with two columns: ArrayTaskID and Sample. Each row of the first column is an integer number that represents the TaskID. The rows of the second column are the absolute path of the folder that contains the N cycles of the acquisition. As of now, the content of this latter folder contains multiple folders with the name x_CycleN. See the images below for reference of the acquisitions.tsv file and the cycles folder, in this representative example we apply the pipeline to the cycles in the folder mydir/acquisition_A.

    Screenshot of the sample array file

    Screenshot of cycles inside acquisition_A

    1. Create a directory for the outputs of macsima2mc, e.g. output_dir.

    2. Specify the inputs, open the staging.sh file (see figure below,red arrows) and specify the following inputs:

      • SBATCH --array=1 : range of samples from the ArrayTaskID on which the staging will be applied. In this example we only have ID 1. If m samples then,SBATCH --array=1-m.

      • acquisitions : path to the sample array file acquisitions=/myarrays/acquisitions_array.tsv.

      • staging_container=absolute path to the macsima2mc container downloaded above in point 1. staging_container=/mycontainers/multiplex_macsima_v1.0.0.sif.

      • output_dir=absolute path to the created output_dir folder output_dir=/home/output_dir.

      Screenshot of staging.sh

    3. Save the changes to the staging.sh file and run it:

    sbatch staging.sh
    
    1. Outputs, once the staging script is over, the restructured MACSima data sets will be found in the output_dir:

      • a folder per acquisition group will be created in there. An acquisition group is an image data set with the same acquisition conditions in terms of rack_number, well_number, roi_number and exposure levels, in our example there are two such groups, rack-01-well-B01-roi-001-exp-1 and rack-01-well-B01-roi-001-exp-2. rack, well and roi are extracted from the information in the raw tiles but the exposure level is a ranking from lower to highest exposure time generated by the macsima2mc tool. The actual exposure time in ms can be found in the metadata of the following output.

      Screenshot of output_dir content

      • Inside the folder of each acquisition group there will be a folder named raw in which two ome.tiff files per cycle are written, one of this files corresponds to the signal of the markers (src- S) and the other one to the background images (src- B) (See figure below).

      Screenshot of acquisition group content

      Screenshot of acquisition raw content

      • A file markers.csv inside each acquisition group folder. This file is a required input for the next processing step, which is running MCMICRO. The content of the markers.csv is shown in the figure below.

      Screenshot of markers.csv

    2. Important notes

      • In the staging.sh script provided here, the flag -ic of the macsima2mc script is activated, this means that the tiles will be illumination-corrected when running macsima2mc,i.e. the ome.tiff images in raw are already corrected tiles and the file name will be tagged with the prefix corr. The illumination profiles used for the correction are generated with basicpy BaSiCPy.
      • Do not move the markers.csv files to other location, MCMICRO expectes them to be at the same level of the raw folder.
  2. Execution of MCMICRO

    1. Create a sample array (samples.tsv) with the paths to the folders of the acquisition groups that will be processed with MCMICRO. Those folders are the output of the staging process described above. Screenshot of samples

    2. Download the params.yml file provided here and specify the parameters used for the processing. The parameters are divided in 2 sections workflow and options. The first one determines the general flow of the workflow,i.e. steps to execute and algorithms to use in those steps. In the second section the user specifies the arguments to be used for the selected algorithms, this arguments are the same ones available by the CLI of the algorithm. The CLI of each tool should be consulted in the documentation of that particular tool, as example we show how the CLI documentation of cellpose and ashlar.

      1. workflow:
      • start-at: Start at registration, dont use the illumination option.
      • stop-at: Options are registration, background, segmentation, quantification and downstream. Select at which of those options the processing will stop.
      • background: true
      • segmentation-channel: Specify the index or indices of the channels to be used for segmentation (1-based index). According to our markers.csv, DAPI can be found on channel number 4 and 11.
      • segmentation-recyze: true or false. If true, only the channels specified in segmentation-channel will be ingested by the segmentation algorithm.
      • segmentation: select the segmentation algorithm to use. Options are cellpose, mesmer, ilastik,unmicst.
      1. options:
      • segmentation: arguments of the selected segmentation algorithm in the workflow section. In this example we use those corresponding to cellpose.
      • ashlar: arguments of the registration and stitching algorithm. For MACSima data it is important to provide in this section the argument --flip-y, this is to account for the positions of the tile provided in an inverted y-axis.

      Screenshot of params.yml

    3. Open the macsima_job.sh file, give the path to the singularity.config file, to the params.yml file and the samples.tsv file. Sabe the changes and run the job.

    sbatch macsima_job.sh
    

    Screenshot of macsima_job