fMRI Data Analysis Workflows with Apache Spark and Thunder.
In this section we discuss in detail how to install and run an fmriFlow application step-by-step.
- Download Spark from the official website: http://spark.apache.org/downloads.html
-
Extract Spark:
tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz
-
Set the environmental variables:
-
Open ~/.bashrc with your favorite editor, e.g.
nano ~/.bashrc
-
Append the following two lines
export SPARK_HOME=(The path at which the Spark was extracted at step 2)
export PATH=$PATH:$SPARK_HOME/bin
-
Open ~/.bashrc with your favorite editor, e.g.
-
git clone [email protected]:thunder-project/thunder.git
-
cd thunder-project
-
python setup.py install
-
rm -rf thunder-project
Just clone this repository: git clone [email protected]:gsvic/fmriFlow.git
In order to run an application you just need to define the workflow in a Python file and submit it to Spark. To run the provided test.py you just type: spark-submit test.py
. In this example we use sample input data from Thunder-Project.
A new workflow can be defined in a Python script just like the example above. In detail:
-
Define the workflow by providing a Spark Context and an input path(.nii file)
flow1 = Workflow(datapath, sc)
-
Add some operators
flow1 = Workflow(datapath, sc).extract().clustering(k=5).visualize()
-
Execute the workflow
flow1.execute()
-
Or print the execution plan
print flow1.explain()
- extract(): Extracts features into time series
- clustering(k): K-Means clustering
- visualizeBrain(): Visualizes a specific slice of the brain
- visualize(nsamples): Visualizes nsamples data points
It is also possible to execute operations via bash using the scripts in the /scripts folder with the following parameters:
run.sh
--path
the input path--operator
the operator--path
the input path--model
a serialized model from previous execution--vector
a neuron-vector to be given as input to the model above in order to compute its corresponding cluster
- Train and save a model:
sbin/run.sh --path ../bold_dico.nii --operator ts
: Runs a K-Means clustering on the input dataset and serializes it in disk - Load a trained model:
sbin/run.sh --operator pr --model model --vector "[...]"
: Predicts the cluster center of the input vector using the input model
visualizeBrain.sh $INPUT
visualizeData.sh $INPUT $NSAMPLES
visualizeClusters.sh $INPUT $K
###Understanding fMRI Data http://www.biostat.jhsph.edu/~mlindqui/Papers/STS282.pdf
http://psydata.ovgu.de/forrest_gump/
http://studyforrest.org/7tmusicdata.html
https://github.com/hanke/gumpdata
http://klab.smpp.northwestern.edu/wiki/images/9/9b/Big_data_klab.pdf
Neuroimaging Informatics Technology Initiative
http://nifti.nimh.nih.gov
http://nipy.org/nibabel
##Acknowledgments
This project was developed for the purposes of Digital Image Processing (HY620) Course of Dept. of Informatics at Ionian University.
Course Page: http://di.ionio.gr/en/component/content/article/19-modules/semester-5/58-digital-image-processing.html