Merge pull request #112 from alexandreroutier/aibl

Make AIBL-2-BIDS compatible with latest clinical data; Set minimal Python version to 3.7
aramis-lab · Aug 10, 2020 · eaaf3b2 · eaaf3b2
2 parents 681c2ec + b7a5378
commit eaaf3b2
Show file tree

Hide file tree

Showing 8 changed files with 133 additions and 116 deletions.
diff --git a/README.md b/README.md
@@ -40,83 +40,129 @@
 
 ## About The Project
 
-Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data (neuroimaging, clinical and cognitive evaluations, genetics...), most often with longitudinal follow-up.
-
-Clinica is command-line driven and written in Python. It uses the [Nipype](https://nipype.readthedocs.io/) system for pipelining and combines widely-used software packages for neuroimaging data analysis ([ANTs](http://stnava.github.io/ANTs/), [FreeSurfer](https://surfer.nmr.mgh.harvard.edu/), [FSL](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki), [MRtrix](https://www.mrtrix.org/), [PETPVC](https://github.com/UCL/PETPVC), [SPM](https://www.fil.ion.ucl.ac.uk/spm/)), machine learning ([Scikit-learn](https://scikit-learn.org/stable/)) and the [BIDS standard](http://bids-specification.readthedocs.io/) for data organization.
-
-Clinica provides tools to convert publicly available neuroimaging datasets into BIDS, namely:
+Clinica is a software platform for clinical research studies involving patients
+with neurological and psychiatric diseases and the acquisition of multimodal
+data (neuroimaging, clinical and cognitive evaluations, genetics...), most
+often with longitudinal follow-up.
+
+Clinica is command-line driven and written in Python. It uses the
+[Nipype](https://nipype.readthedocs.io/) system for pipelining and combines
+widely-used software packages for neuroimaging data analysis
+([ANTs](http://stnava.github.io/ANTs/),
+[FreeSurfer](https://surfer.nmr.mgh.harvard.edu/),
+[FSL](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki),
+[MRtrix](https://www.mrtrix.org/), [PETPVC](https://github.com/UCL/PETPVC),
+[SPM](https://www.fil.ion.ucl.ac.uk/spm/)), machine learning
+([Scikit-learn](https://scikit-learn.org/stable/)) and the [BIDS
+standard](http://bids-specification.readthedocs.io/) for data organization.
+
+Clinica provides tools to convert publicly available neuroimaging datasets into
+BIDS, namely:
 
 - [ADNI: Alzheimer’s Disease Neuroimaging Initiative](http://www.clinica.run/doc/Converters/ADNI2BIDS)
 - [AIBL: Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing](http://www.clinica.run/doc/Converters/AIBL2BIDS)
 - [NIFD: Neuroimaging in Frontotemporal Dementia](http://www.clinica.run/doc/Converters/NIFD2BIDS)
 - [OASIS: Open Access Series of Imaging Studies](http://www.clinica.run/doc/Converters/OASIS2BIDS)
 
-Clinica can process any BIDS-compliant dataset with a set of complex processing pipelines involving different software packages for the analysis of neuroimaging data (T1-weighted MRI, diffusion MRI and PET data). It also provides integration between feature extraction and statistics, machine learning or deep learning.
+Clinica can process any BIDS-compliant dataset with a set of complex processing
+pipelines involving different software packages for the analysis of
+neuroimaging data (T1-weighted MRI, diffusion MRI and PET data). It also
+provides integration between feature extraction and statistics, machine
+learning or deep learning.
 
 ![ClinicaPipelines](http://www.clinica.run/img/clinica_pipelines.png)
 
 <p align="center">
   <i>Current pipelines are indicated in blue while new or updated pipelines are indicated in purple (will be released in Summer 2020).</i>
 </p>
 
-Clinica is also showcased as a framework for the reproducible classification of Alzheimer's disease using [machine learning](https://github.com/aramis-lab/AD-ML) and [deep learning](https://github.com/aramis-lab/AD-DL).
+Clinica is also showcased as a framework for the reproducible classification of
+Alzheimer's disease using [machine
+learning](https://github.com/aramis-lab/AD-ML) and [deep
+learning](https://github.com/aramis-lab/AD-DL).
 
 
 
 ## Getting Started
 > Full instructions for installation and additional information can be found in
 the [user documentation](http://www.clinica.run/doc).
 
-Clinica currently supports macOS and Linux. It can be installed:
+Clinica currently supports macOS and Linux. It can be installed by typing the
+following command:
 
-- With `conda` (recommended):
 ```sh
-conda create --name clinicaEnv python=3.6 clinica -c Aramislab -c conda-forge
+pip install clinica
 ```
 
-- With `pip` (needs Python 3.6)
+To avoid conflicts with other versions of the dependency packages installed by
+pip, it is strongly recommended to create a virtual environment before the
+installation.  For example, use
+[Conda](https://docs.conda.io/en/latest/miniconda.html), to create a virtual
+environment and activate it before installing clinica (you can also use
+`virtualenv`):
+
 ```sh
-pip install clinica
+conda create --name clinicaEnv python=3.7
+conda activate clinicaEnv
 ```
 
-- Using the [Developer installation](http://www.clinica.run/doc/Installation/#developer-installation)
-
-Depending on the pipeline that you want to use, you need to install pipeline-specific interfaces. Not all the dependencies are necessary to run Clinica. Please refer to this [page](http://www.clinica.run/doc/Third-party/) to determine which third-party libraries you need to install.
+Depending on the pipeline that you want to use, you need to install
+pipeline-specific interfaces. Not all the dependencies are necessary to run
+Clinica. Please refer to this [page](http://www.clinica.run/doc/Third-party/)
+to determine which third-party libraries you need to install.
 
 ## Example
 
-Diagram illustrating the Clinica pipelines involved when performing a group comparison of FDG PET data projected on the cortical surface between patients with Alzheimer's disease and healthy controls from the ADNI database:
+Diagram illustrating the Clinica pipelines involved when performing a group
+comparison of FDG PET data projected on the cortical surface between patients
+with Alzheimer's disease and healthy controls from the ADNI database:
 ![ClinicaExample](http://www.clinica.run/img/clinica_example.png)
-1. Clinical and neuroimaging data are downloaded from the ADNI website and data are converted into BIDS with the [`adni-to-bids` converter](http://www.clinica.run/doc/Converters/ADNI2BIDS).
-2. Estimation of the cortical and white surface is then produced by the [`t1-freesurfer` pipeline](http://www.clinica.run/doc/Pipelines/T1_FreeSurfer).
-3. FDG PET data can be projected on the subject’s cortical surface and normalized to the FsAverage template from FreeSurfer using the [`pet-surface` pipeline](http://www.clinica.run/doc/Pipelines/PET_Surface).
-4. TSV file with demographic information of the population studied is given to the [`statistics-surface` pipeline](http://www.clinica.run/doc/Pipelines/Stats_Surface) to generate the results of the group comparison.
+1. Clinical and neuroimaging data are downloaded from the ADNI website and data
+   are converted into BIDS with the [`adni-to-bids`
+   converter](http://www.clinica.run/doc/Converters/ADNI2BIDS).
+2. Estimation of the cortical and white surface is then produced by the
+   [`t1-freesurfer`
+   pipeline](http://www.clinica.run/doc/Pipelines/T1_FreeSurfer).
+3. FDG PET data can be projected on the subject’s cortical surface and
+   normalized to the FsAverage template from FreeSurfer using the
+   [`pet-surface` pipeline](http://www.clinica.run/doc/Pipelines/PET_Surface).
+4. TSV file with demographic information of the population studied is given to
+   the [`statistics-surface`
+   pipeline](http://www.clinica.run/doc/Pipelines/Stats_Surface) to generate
+   the results of the group comparison.
 
-> For more examples and details, please refer to the [Documentation](http://www.clinica.run/doc/).
+> For more examples and details, please refer to the
+> [Documentation](http://www.clinica.run/doc/).
 
 
 
 
 
 ## Support
 - [Report an issue on GitHub](https://github.com/aramis-lab/clinica/issues)
-- Use the [Clinica Google Group](https://groups.google.com/forum/#!forum/clinica-user) to ask for help!
+- Use the [Clinica Google
+  Group](https://groups.google.com/forum/#!forum/clinica-user) to ask for help!
 
 
 
 
 
 <!--
 ## Contributing
-We encourage you to contribute to Clinica! Please check out the [Contributing to Clinica guide](Contributing.md) for guidelines about how to proceed. Do not hesitate to ask questions if something is not clear for you, report an issue, etc.
+We encourage you to contribute to Clinica! Please check out the [Contributing
+to Clinica guide](Contributing.md) for guidelines about how to proceed. Do not
+hesitate to ask questions if something is not clear for you, report an issue,
+etc.
 -->
 
 
 
 
 ## License
 
-This software is distributed under the MIT License. See [license file](https://github.com/aramis-lab/clinica/blob/dev/LICENSE.txt) for more information.
+This software is distributed under the MIT License. See [license
+file](https://github.com/aramis-lab/clinica/blob/dev/LICENSE.txt) for more
+information.
 
 
 ## Related Repositories

diff --git a/clinica/iotools/converters/aibl_to_bids/aibl_to_bids.py b/clinica/iotools/converters/aibl_to_bids/aibl_to_bids.py
@@ -4,15 +4,6 @@
 Convert the AIBL dataset (http://www.aibl.csiro.au/) into BIDS.
 """
 
-__author__ = "Simona Bottani"
-__copyright__ = "Copyright 2016-2019 The Aramis Lab Team"
-__credits__ = ["Simona Bottani"]
-__license__ = "See LICENSE.txt file"
-__version__ = "0.1.0"
-__maintainer__ = "Simona Bottani"
-__email__ = "[email protected]"
-__status__ = "Development"
-
 
 def convert_images(path_to_dataset, path_to_csv, bids_dir):
 

diff --git a/clinica/iotools/converters/aibl_to_bids/aibl_to_bids_cli.py b/clinica/iotools/converters/aibl_to_bids/aibl_to_bids_cli.py
@@ -4,23 +4,16 @@
 
 
 class AiblToBidsCLI(ce.CmdParser):
-    """
-    todo:add description
-    """
-
     def define_name(self):
-        """Define the sub-command name to run this pipelines.
-        """
+        """Define the sub-command name to run this pipeline."""
         self._name = 'aibl-to-bids'
 
     def define_description(self):
-        """Define a description of this pipeline.
-        """
-        self._description = 'Convert AIBL (https://aibl.csiro.au/adni/index.html”) into BIDS.'
+        """Define a description of this pipeline."""
+        self._description = 'Convert AIBL (https://aibl.csiro.au/adni/index.html) into BIDS.'
 
     def define_options(self):
-        """Define the sub-command arguments
-        """
+        """Define the sub-command arguments."""
         self._args.add_argument("dataset_directory",
                                 help='Path to the AIBL images directory.')
         self._args.add_argument("clinical_data_directory",
@@ -29,36 +22,20 @@ def define_options(self):
                                 help='Path to the BIDS directory.')
 
     def run_command(self, args):
-        from clinica.iotools.converters.aibl_to_bids.aibl_to_bids import convert_clinical_data, convert_images
-        from clinica.utils.stream import cprint
         from os.path import exists
         from os import makedirs
         from colorama import Fore
-        from clinica.iotools.converter_utils import check_bin
-        import sys
-
-        # Check existence of binaries dcm2nii, dcm2niix and mri_convert
-        # If they are not found we warn the user and tell him that the bin
-        # were not foud. He then has the possibility to run the converter anyway
-        missing_bin = 0
-        bin_to_test = ['dcm2nii', 'dcm2niix', 'mri_convert']
-        for binary in bin_to_test:
-            missing_bin = missing_bin + check_bin(binary)
-        if missing_bin > 0:
-            cprint(Fore.RED + str(missing_bin) + ' binary(es) is (are) missing. '
-                   + 'Most important are : dcm2nii and dcm2niix.' + Fore.RESET)
-            while True:
-                cprint('Do you still want to run the converter ? (yes/no): ')
-                answer = input('')
-                if answer.lower() in ['yes', 'no']:
-                    break
-                else:
-                    cprint('Possible answers are yes or no.\n')
-            if answer.lower() == 'yes':
-                cprint('Running the pipeline anyway.')
-            if answer.lower() == 'no':
-                cprint('Exiting clinica...')
-                sys.exit()
+        from clinica.iotools.converters.aibl_to_bids.aibl_to_bids import convert_clinical_data, convert_images
+        from clinica.utils.check_dependency import is_binary_present, check_freesurfer
+        from clinica.utils.exceptions import ClinicaMissingDependencyError
+
+        list_binaries = ['dcm2niix', 'dcm2nii']
+        for binary in list_binaries:
+            if not is_binary_present(binary):
+                raise ClinicaMissingDependencyError(
+                    '%s\n[Error] Clinica could not find %s software: it is not present in your '
+                    'PATH environment.%s' % (Fore.RED, binary, Fore.RESET))
+        check_freesurfer()
 
         if not exists(args.bids_directory):
             makedirs(args.bids_directory)

diff --git a/clinica/iotools/converters/aibl_to_bids/aibl_utils.py b/clinica/iotools/converters/aibl_to_bids/aibl_utils.py
@@ -4,15 +4,6 @@
 Utils to convert AIBL dataset in BIDS
 """
 
-__author__ = "Simona Bottani"
-__copyright__ = "Copyright 2016-2019 The Aramis Lab Team"
-__credits__ = ["Simona Bottani"]
-__license__ = "See LICENSE.txt file"
-__version__ = "0.1.0"
-__maintainer__ = "Simona Bottani"
-__email__ = "[email protected]"
-__status__ = "Development"
-
 
 def listdir_nohidden(path):
     """
@@ -463,10 +454,11 @@ def find_path_to_T1(path_to_dataset, path_to_csv):
     """
     import os
     import pandas
+    import glob
 
     # two csv_files contain information regarding the T1w MRI images
-    mri_meta = pandas.read_csv(os.path.join(path_to_csv, "aibl_mrimeta_28-Apr-2015.csv"))
-    mri_3meta = pandas.read_csv(os.path.join(path_to_csv, "aibl_mri3meta_28-Apr-2015.csv"))
+    mri_meta = pandas.read_csv(glob.glob(os.path.join(path_to_csv, "aibl_mrimeta_*.csv"))[0])
+    mri_3meta = pandas.read_csv(glob.glob(os.path.join(path_to_csv, "aibl_mri3meta_*.csv"))[0])
     file_mri = [mri_meta, mri_3meta]
     subjects_ID = listdir_nohidden(path_to_dataset)
     # list of all the folders which correspond to the subject_ID
@@ -508,6 +500,7 @@ def paths_to_bids(path_to_dataset, path_to_csv, bids_dir, modality):
     from clinica.utils.stream import cprint
     from multiprocessing.dummy import Pool
     from multiprocessing import cpu_count, Value
+    import glob
 
     if modality.lower() not in ['t1', 'av45', 'flute', 'pib']:
         # This should never be reached
@@ -569,13 +562,16 @@ def create_file(image):
     if modality == 't1':
         images = find_path_to_T1(path_to_dataset, path_to_csv)
     else:
-        path_to_csv_pet_modality = join(path_to_csv, 'aibl_' + modality
-                                        + 'meta_28-Apr-2015.csv')
+        path_to_csv_pet_modality = glob.glob(join(
+            path_to_csv, 'aibl_' + modality + 'meta_*.csv')
+        )[0]
         if not exists(path_to_csv_pet_modality):
             raise FileNotFoundError(path_to_csv_pet_modality
                                     + ' file not found in clinical data folder')
-        # separator information : either ; or ,
-        df_pet = pds.read_csv(path_to_csv_pet_modality, sep=',|;')
+        # Latest version of Flutemetamol CSV file (aibl_flutemeta_01-Jun-2018.csv)
+        # has an extra column for some rows. However, each CSV file (regarding PET tracers)
+        # contains the same columns. The usecols fixes this issue.
+        df_pet = pds.read_csv(path_to_csv_pet_modality, sep=',|;', usecols=list(range(0, 36)))
         images = find_path_to_pet_modality(path_to_dataset,
                                            df_pet)
     images.to_csv(join(bids_dir, modality + '_paths_aibl.tsv'),
@@ -615,6 +611,7 @@ def create_participants_df_AIBL(input_path, clinical_spec_path, clinical_data_di
     from os import path
     import re
     import numpy as np
+    import glob
 
     fields_bids = ['participant_id']
     fields_dataset = []
@@ -660,9 +657,9 @@ def create_participants_df_AIBL(input_path, clinical_spec_path, clinical_data_di
                 file_to_read_path = path.join(clinical_data_dir, location)
 
                 if file_ext == '.xlsx':
-                    file_to_read = pd.read_excel(file_to_read_path, sheet_name=sheet)
+                    file_to_read = pd.read_excel(glob.glob(file_to_read_path)[0], sheet_name=sheet)
                 elif file_ext == '.csv':
-                    file_to_read = pd.read_csv(file_to_read_path)
+                    file_to_read = pd.read_csv(glob.glob(file_to_read_path)[0])
                 prev_location = location
                 prev_sheet = sheet
 
@@ -717,6 +714,7 @@ def create_sessions_dict_AIBL(input_path, clinical_data_dir, clinical_spec_path)
     """
     import pandas as pd
     from os import path
+    import glob
     import numpy as np
 
     # Load data
@@ -745,7 +743,7 @@ def create_sessions_dict_AIBL(input_path, clinical_data_dir, clinical_spec_path)
             tmp = field_location[i]
             location = tmp[0]
             file_to_read_path = path.join(clinical_data_dir, tmp)
-            files_to_read.append(file_to_read_path)
+            files_to_read.append(glob.glob(file_to_read_path)[0])
             sessions_fields_to_read.append(sessions_fields[i])
 
     rid = pd.read_csv(files_to_read[0], dtype={'text': str}, low_memory=False).RID
@@ -801,18 +799,20 @@ def create_sessions_dict_AIBL(input_path, clinical_data_dir, clinical_spec_path)
 
 
 def get_examdates(rid, examdates, viscodes, clinical_data_dir):
-
+    import glob
     from os import path
     from datetime import datetime
     from dateutil.relativedelta import relativedelta
     import pandas as pd
     res_examdates = []
-    csv_list = ('aibl_mri3meta_28-Apr-2015.csv',
-                'aibl_mrimeta_28-Apr-2015.csv',
-                'aibl_cdr_28-Apr-2015.csv',
-                'aibl_flutemeta_28-Apr-2015.csv',
-                'aibl_mmse_28-Apr-2015.csv',
-                'aibl_pibmeta_28-Apr-2015.csv')
+    csv_list = [
+        glob.glob(path.join(clinical_data_dir, 'aibl_mri3meta_*.csv'))[0],
+        glob.glob(path.join(clinical_data_dir, 'aibl_mrimeta_*.csv'))[0],
+        glob.glob(path.join(clinical_data_dir, 'aibl_cdr_*.csv'))[0],
+        glob.glob(path.join(clinical_data_dir, 'aibl_flutemeta_*.csv'))[0],
+        glob.glob(path.join(clinical_data_dir, 'aibl_mmse_*.csv'))[0],
+        glob.glob(path.join(clinical_data_dir, 'aibl_pibmeta_*.csv'))[0]
+    ]
 
     for e in range(len(examdates)):
         exam = examdates[e]
@@ -823,7 +823,10 @@ def get_examdates(rid, examdates, viscodes, clinical_data_dir):
 
         # If EXAMDATE does not exist (-4) we try to obtain it from another .csv file
         for csv_file in csv_list:
-            csv_data = pd.read_csv(path.join(clinical_data_dir, csv_file), low_memory=False)
+            if 'aibl_flutemeta' in csv_file:
+                csv_data = pd.read_csv(csv_file, low_memory=False, usecols=list(range(0, 36)))
+            else:
+                csv_data = pd.read_csv(csv_file, low_memory=False)
             exam_date = csv_data[(csv_data.RID == rid) & (csv_data.VISCODE == viscodes[e])]
             if not exam_date.empty and exam_date.iloc[0].EXAMDATE != '-4':
                 exam = exam_date.iloc[0].EXAMDATE