-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate taxa by summary data #208
Changes from 6 commits
f731cef
216b413
9a7d6d8
face672
553f777
284613d
a8254bf
5c2da8a
1e6aa1e
1eb39c8
5dab310
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
from os.path import join | ||
from copy import copy | ||
|
||
from biom import load_table | ||
|
||
import americangut.notebook_environment as agenv | ||
from .util import collapse_full, collapse_taxonomy, get_existing_path | ||
|
||
|
||
def cat_taxa_summaries(): | ||
"""Creates taxa summary files for each available summary category per site | ||
""" | ||
paths = copy(agenv.paths['collapsed']['notrim']['1k']) | ||
out_dir = get_existing_path( | ||
agenv.path['populated-templates']['result-taxa']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, fixed |
||
del paths['ag-biom'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you deleting this path? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah because it's a biom table including all sites ..? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct |
||
for name, path in paths.items(): | ||
# consistent naming as stool for all participant items | ||
name = name.replace('-biom', '').replace('fecal', 'stool') | ||
table = load_table(get_existing_path(path)) | ||
if len(name.split('-')) == 2: | ||
# Have entire cohort of samples for site, so need to get averages | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. get the medians? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
table = collapse_full(table) | ||
table = collapse_taxonomy(table) | ||
ids = table.ids(axis='observation') | ||
for col in table.ids(): | ||
if col == 'Unknown': | ||
continue | ||
cleaned_col = col.split('(')[0].strip().replace(' ', '_') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the text in braces not useful anymore? ex. Occasionally (1-2 times/week) -> Occasionally There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. never mind, this is only done for file naming .. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, makes the file names managable |
||
filename = '-'.join([name, cleaned_col]) + '.txt' | ||
|
||
with open(join(out_dir, filename), 'w') as f: | ||
for otu, val in zip(ids, table.data(col)): | ||
if val == 0.0: | ||
continue | ||
f.write('%s\t%s\n' % (otu, str(val))) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
from itertools import izip | ||
from StringIO import StringIO | ||
from collections import defaultdict | ||
from biom import Table | ||
|
||
from lxml import etree | ||
from skbio.parse.sequences import parse_fastq, parse_fasta | ||
|
@@ -635,3 +636,58 @@ def get_single_id_lists(map_, depths): | |
single_ids[depths[idx - 1]].extend(single_ids[depths[idx]]) | ||
|
||
return single_ids | ||
|
||
|
||
def collapse_taxonomy(_bt, level=5): | ||
"""Collapses OTUs by taxonomy | ||
|
||
Parameters | ||
---------- | ||
_bt : biom table | ||
Table to collapse | ||
level : int, optional | ||
Level to collapse to. 0=kingdom, 1=phylum,...,5=genus, 6=species | ||
Default 5 | ||
|
||
Returns | ||
------- | ||
biom table | ||
Collapsed biom table | ||
|
||
Citations | ||
--------- | ||
[1] http://biom-format.org/documentation/table_objects.html | ||
""" | ||
def collapse_f(id_, md): | ||
return '; '.join(md['taxonomy'][:level + 1]) | ||
collapsed = _bt.collapse(collapse_f, axis='observation', norm=False) | ||
return collapsed | ||
|
||
|
||
def collapse_full(_bt): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it not possible to pass a collapse function to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried that, but they need to collapse by sample not by taxa, and I don't think that's possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, collapse by sample data, not by taxa metadata. I'll dig into it and see if it's possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might be able to do it with return Table([np.median(v) for v in _bt.iter_data(axis='observation')],
_bt.ids(axis='observation'),
observation_metadata=_bt.metadata(axis='observation')) Unfortunately, because of the rolling issue, you can't do the "cleanest" thing which would be a # won't work... :(
return Table.reduce(np.median, axis='observation') There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...at one point, we had discussed adding in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. little more complicated than the given return, but greatly simplified. Thanks. |
||
"""Collapses full biom table to median of each OTU | ||
|
||
Parameters | ||
---------- | ||
_bt : biom table | ||
Table to collapse | ||
|
||
Returns | ||
------- | ||
biom table | ||
Collapsed biom table, one sample containing median of each OTU, | ||
normalized. | ||
""" | ||
data = np.empty([_bt.length('observation'), 1]) | ||
obs = [] | ||
observation_metadata = [] | ||
pos = 0 | ||
for vals, id_, metadata in _bt.iter(axis='observation'): | ||
data[pos][0] = np.median(vals) | ||
obs.append(id_) | ||
observation_metadata.append(metadata) | ||
pos += 1 | ||
table = Table(data, obs, ['average'], | ||
observation_metadata=observation_metadata) | ||
table.norm(inplace=True) | ||
return table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added