Skip to content

Commit

Permalink
Merge pull request #5 from xinehc/dev
Browse files Browse the repository at this point in the history
v3.1
  • Loading branch information
xinehc authored Jun 9, 2022
2 parents 6879fd4 + 2c2fb20 commit 1747a6d
Show file tree
Hide file tree
Showing 28 changed files with 34,534 additions and 26,274 deletions.
63 changes: 35 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,24 @@ This repository was created by Xiaole Yin (_xiaole99_) and is currently maintain

If you have any questions, please contact Xiaole Yin ([[email protected]]([email protected])).

## Changes
The change log of this version (June, 2022) includes:
+ We updated the SARG database and the corresponding structure file to version 3.0 ([SARG v3.0-M](https://smile.hku.hk/pipeline/#/Indexing/download)) .
+ We dropped bbmap and usearch from the pipeline, now args_oap support both linux and osx.
+ We modified the 16s estimation process by changing minimap2 to bwa + blastn, as minimap2 does not work well for reads that are super short (e.g. below 100 bp, see [https://github.com/lh3/minimap2/issues/363#issuecomment-473387994](https://github.com/lh3/minimap2/issues/363#issuecomment-473387994)).
+ We fixed the version of diamond to 0.9.24 (and python to 3.7.\*), as the latest version of diamond (2.0.15) will gives ~10% more hits of USCMGs and ARGs. The sensitivity of the newer version of diamond is under evaluation. We hope to remove this constrain in future updates.
+ Bug fixed:
+ Fixed a bug that caused the worst hits (instead of the best) being picked in stagetwo's blastx when multiple candidates of ARGs can be found.
+ Fixed a bug that caused some multi-component ARGs hits being uncounted in stagetwo's aggregation process.
+ Fixed a bug in stageone that caused USCMG being slightly overestimated.
+ Fixed a bug in stageone that caused parameters -x -y -v being ignored.

## Installation
Conda (osx-64/linux-64):
```bash
conda install -c bioconda -c conda-forge xinehc::args_oap=3.0
conda install -c bioconda -c conda-forge xinehc::args_oap=3.1
```

We'd suggest to create a new conda environment (here use `-n args_oap` as an example) to avoid potential conflicts of dependencies:
```bash
conda create -n args_oap -c bioconda -c conda-forge xinehc::args_oap=3.0
conda create -n args_oap -c bioconda -c conda-forge xinehc::args_oap=3.1
conda activate args_oap
```

Args_oap depends on `python==3.7`, `diamond==0.9.24`, `bwa>=0.7.17`, `blast>=2.12`, `samtools>=1.15`, `fastp>=0.23.2`, `pandas`. If your OS has all the dependencies, then it can be built from source:
Args_oap depends on `python>=3.7`, `diamond>=2.0.15`, `bwa>=0.7.17`, `blast>=2.12`, `samtools>=1.15`, `fastp>=0.23.2`, `pandas`. If your OS has all the dependencies, then it can be built from source:
```bash
git clone https://github.com/xinehc/args_oap.git
cd args_oap
python setup.py install # use python3 if needed
```
**Please note that currently only the 0.9.24 version of diamond is supported, we hope to remove this constrain in future updates.**

## Example
Two examples (100k paired-end reads, 100 bp each) can be found [here](https://dl.dropboxusercontent.com/s/054ufvfahchfk7f/example.tar.gz). The zipped file can be downloaded manually or using `wget`:
Expand All @@ -53,23 +40,24 @@ After `stage_one`, a `meta_data_online.txt` file can be found in `output`. It su

| SampleID | Name | Category | ReadLength | #ofReads | #of16Sreads | CellNumber |
|----------|----------|----------|------------|----------|------------------|------------------|
| 1 | STAS | ST | 100 | 200000 | 9.35754189944134 | 2.95276936764951 |
| 2 | SWHAS104 | SWH | 100 | 200000 | 8.5195530726257 | 3.30218758791575 |
| 1 | STAS | ST | 100 | 200000 | 9.35754189944134 | 2.82910169634418 |
| 2 | SWHAS104 | SWH | 100 | 200000 | 8.5195530726257 | 3.16376328499407 |

After `stagetwo`, the normalized ARGs copies per 16s/cells or hits/reads will be shown in several `*_normalized_*.txt` files. For example, `output.normalize_16s.type` means:
+ **normalized_16s** - normalized against 16s rRNA copies
+ **type** - Type of ARGs (the hierarchy in SARG is type -> subtype -> gene)

| Type | STAS | SWHAS104 |
|----------------|----------------------|-----------------------|
| MLS | 0.0 | 0.006280321819611637 |
| aminoglycoside | 0.014248756218905473 | 0.05689225096549162 |
| bacitracin | 0.012526379093543273 | 0.02387830588588363 |
| beta-lactam | 0.0 | 0.06118010268499747 |
| mupirocin | 0.002609025186567164 | 0.0037497024423531647 |
| quinolone | 0.1272415290809021 | 0.036200398345334756 |
| sulfonamide | 0.011830148152227792 | 0.056019782667944225 |
| tetracycline | 0.004097610108964381 | 0.04088706547697995 |
| Type | STAS | SWHAS104 |
|-------------------------------------|----------------------|-----------------------|
| aminoglycoside | 0.014248756218905473 | 0.05778644603054001 |
| bacitracin | 0.012526379093543273 | 0.02387830588588363 |
| beta_lactam | 0.0 | 0.06118010268499747 |
| macrolide-lincosamide-streptogramin | 0.0 | 0.009842691110578379 |
| multidrug | 0.004327798193369976 | 0.013092162368663536 |
| mupirocin | 0.002609025186567164 | 0.0037497024423531647 |
| quinolone | 0.1272415290809021 | 0.036200398345334756 |
| sulfonamide | 0.011830148152227792 | 0.056019782667944225 |
| tetracycline | 0.004097610108964381 | 0.04088706547697995 |

### (mandatory) Prepare the meta-data.txt file
(We hope to remove the manual meta-data.txt preparation step in future updates)
Expand Down Expand Up @@ -120,6 +108,25 @@ After a while or so, you will notice that their are four files generated for you

There are some questions raised by users, please refer to the [FAQ](https://github.com/biofuture/Ublastx_stageone/wiki/FAQ) for details. To run ARG OAP locally, users should download the source code into local computer system (Unix/Linux). Users can upload the generated files for stage two onto our Galaxy analysis platform (http://smile.hku.hk/SARGs) or use the local version of stage two script.


## Changes log
#### Version 3.1 (09. June, 2022)
+ Minor changes of the SARG database (now the v3.0-M database includes both multi-component and two-component ARGs).
+ Remove the constraint on diamond's version, add a new parameter `-w` and changed the default parameter of `-v` in stageone to compensate the changes.
+ Add a `mt_mode` switcher in stagetwo's blastx to make it faster when more than 5 cores are available.
+ Add a logger to make stagetwo's output more clear.

#### Version 3.0 (04. June, 2022)
+ We updated the SARG database and the corresponding structure file to version 3.0 ([SARG v3.0-M](https://smile.hku.hk/pipeline/#/Indexing/download)) .
+ We dropped bbmap and usearch from the pipeline, now args_oap support both linux and osx.
+ We modified the 16s estimation process by changing minimap2 to bwa + blastn, as minimap2 does not work well for reads that are super short (e.g. below 100 bp, see [https://github.com/lh3/minimap2/issues/363#issuecomment-473387994](https://github.com/lh3/minimap2/issues/363#issuecomment-473387994)).
+ We fixed the version of diamond to 0.9.24 (and python to 3.7.\*), as the latest version of diamond (2.0.15) will gives ~10% more hits of USCMGs and ARGs. The sensitivity of the newer version of diamond is under evaluation. We hope to remove this constrain in future updates.
+ Bug fixed:
+ Fixed a bug that caused the worst hits (instead of the best) being picked in stagetwo's blastx when multiple candidates of ARGs can be found.
+ Fixed a bug that caused some multi-component ARGs hits being uncounted in stagetwo's aggregation process.
+ Fixed a bug in stageone that caused USCMG being slightly overestimated.
+ Fixed a bug in stageone that caused parameters `-x` `-y` `-v` being ignored.

---
**Notice:**

Expand Down
Loading

0 comments on commit 1747a6d

Please sign in to comment.