Network expansion on biological seed compounds
The easiest way to use this package is to check out the example notebooks. Here you can see the functionality that motivated the consturction of this package.
Can be found in the directory examples
. The data used to successfully run these example notebooks is not included in the BioXP respository, but these should give you an idea how to structure your code.
Example of how to run basic network expansions over many different organisms in a nested file structure.
Example of how to generate ordered seed sets for many different organisms in a nested file structure.
Example of how to run minimal seed set network expansions over many different organisms in a nested file structure (remember that you'll need to have generated your ordered seed sets in order to do the minimal seed set expansions).
Just like the example above, except this time there's a function in the notebook that allows us to fix certain seeds to the front of the ordered seed set lists, so that they are not allowed to be removed. This allows you to figure out what the least number of compounds you need to add to an existing seed set is.
This is the core of the package, containing the code for all functions which "do" the network expansion, and find minimal seed sets. It also contains helper functions for this, including a simple writing function (which is terribly bloated in terms of how much space it takes up since it writes jsons). All functions/types can be access through this file because all the functions/types which are meant to be called externally are imported and exported here.
expand
-- Does a network expansionfind_minimal_seed_set
-- Finds minimal seed sets
Functions that restrict which reactions are accessible based on user-inputted free energy thresholds and other options.
filter_reactions_by_dg
-- ReturnsVector{Bool}
of length of the number of reactions you have, indicating if the reaction is allowed or not--based on your specified dg constraints. Returns forward and backward vectors.
Functions that format output from the expansions/minimal seed sets.
formatbioxpoutput
-- Takes the simple outputs from the expansions/minimal seed sets and adds a lot of information you may or may not care about, to avoid some "preprocessing" when doing data analysis.
Functions that coerce data from the ecg
repo output into appropriate inputs for the BioXP
repo. Also serves as a check on types.
readmaster
-- coerces the "master" file from ecgreadcompounds
-- coerces the compound dir from ecgreadids
-- coerces the seeds/targetsreadkeyedids
-- coerces the seeds/targets (if they're keyed)
Functions that randomize orderings of compounds, which is useful for doing the minimal seed set expansions.
randomizecompounds
-- Return a vector of run results. Each run result is a randomized list of biosystem_compounds.
Definitions of structs and types used in BioXP (these are basically the equivilent of python classes).
Compound
,Compounds
,Reaction
,Reactions
,IDs
Unfortunately there is none right now (I know, lame). But I hope the Julia code is relatively easy to read and the example files help show how the package is meant to be used.
- Add example files
- It would be nice to move all the
ecg
generated data to an AWS bucket, and to store and access it using quilt. Then I could easily pull the quilt data for the example files.
This repository is citable thanks to Zenodo: There is currently no accompanying paper.