-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathprint_report.py
51 lines (31 loc) · 1.67 KB
/
print_report.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def print_report(final, ofile):
frac = final.dropna(axis=1)
frac *= 100
frac = frac.copy()
frac['study'] = frac.index.str.split('/').str[0]
table = frac.groupby('study').describe().round(2).T.loc['strict']
study_counts = final.index.str.split('/').str[0].value_counts()
study_counts.index.name = 'study'
with open(ofile, 'w') as f:
f.write(f'''
# Results of filtering
We estimate the fraction of unigenes (sequences) kept after filtering the
[GMGCv1](https://gmgc.embl.de).
Using two datasets
1. Human gut dataset (from [Zeller et al., 2014](https://doi.org/10.15252/msb.20145645))
1. Dog gut dataset (from [Coelho et al., 2018](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0450-3))
Dataset size (number of samples):
{study_counts.to_markdown()}
## Fraction (%) of elements kept after filtering (with [NGLess](https://ngless.embl.de) preprocessing)
### Using a minimum of 1 matched hash per unigene (for it to be kept)
{table.loc['25q-45ell', 'min1'].drop('count').to_markdown()}
### Using a minimum of 2 matched hashes per unigene (for it to be kept)
{table.loc['25q-45ell', 'min2'].drop('count').to_markdown()}
## Fraction (%) of elements kept after filtering (passthru, i.e., no FQ preprocessing)
### Using a minimum of 1 matched hash per unigene (for it to be kept)
{table.loc['passthru', 'min1'].drop('count').to_markdown()}
### Using a minimum of 2 matched hashes per unigene (for it to be kept)
{table.loc['passthru', 'min2'].drop('count').to_markdown()}
Note that the _fraction_ of kept unigenes is counted, which will not correspond
to the fraction of basepairs as longer genes are more likely to be kept.
''')