Modified Organism Screening by Enrichment
Loosely based on the Debode paper:
- Quality trimming (fastp)
- Align reads to reference genomes of big 6 (bwa)
- Extract RPKM values for each host
- reference alignement to elements (SAUTE)
- Find elements in contigs (BLAST)
- Count reads in contigs ang get coverage
- Try to detect event specific sequences (JRC-EURL Methods) in contigs
- Create reports
- Check if kraken instead of bwa doable
- chop contigs and reblast part with no matches? set a min length to chop (100bp?) use full BLAST db (local) hit selection? (best hit?)
- BLASTGraph improvement, check out ggcontigs (R)
- Compare bwa to simple kraken?
- If bwa upgrade to bwa2
- Automated event detection (API to BCH/Euginius)?
1-Get assemblies from e.g. Genbank (wget or use ncbi browser)
2- concatenate asemblies:
cd path/to/assemblies
cat *.fna > merged_refs.fa
3- Index genomes
bwa index merged_refs.fa
Then use merged_refs.fa as a reference
Multifasta format, unwrapped sequences!