Skip to content

4. Accessing, Reading and Writing MethylationArray Data

Joshua Levy edited this page Jun 26, 2019 · 1 revision

The output from the preprocessing pipeline is a .pkl file, which stands for a Python pickle file. This filetype is much akin to the .RData file one would get from saving R data, and saves python objects that can be loaded. Saved in this file is a python dictionary, with keys pheno and beta, which saves a phenotype pandas data frame and a beta dataframe respectively. In both data frames, samples make up the rows/indices, and CpGs make up the beta matrix columns, different covariates and sample info make up the phenotype dataframe. These two dataframes are used to construct a MethylationArray object, stored in preprocess_outputs/ (or combined_outputs/ when batching/splitting up data).

Let's open up this object by running Python interactively:

python
>>> from pymethylprocess.MethylationDataTypes import MethylationArray
>>> MethylationArray.from_pickle('preprocess_outputs/')
>>> methyl_arr=MethylationArray.from_pickle('preprocess_outputs/methyl_array.pkl')
>>> methyl_arr.pheno
>>> methyl_arr.beta

The outputs of the last two commands were not shown, but this is to say that the MethylationArray object is storing pheno and beta values in the form of data frames, and the data can be loaded in this way. This object has a lot of methods, but this will not be covered here. Consult the help docs for more information.

To write an object to a .pkl file ('some_pickle_file.pkl') for python or CSVs (pheno.csv and beta.csv in some directory 'some_output_directory/') for loading in R, use:

>>> methyl_arr.write_csvs('some_output_directory/')
>>> methyl_array.write_pickle('some_pickle_file.pkl')