diff --git a/docs/api.md b/docs/api.md index be52978..c5b37e1 100644 --- a/docs/api.md +++ b/docs/api.md @@ -13,7 +13,7 @@ This page provides API docs for the Python API of blast2galaxy. show_signature_annotations: false --> - + + + +::: blast2galaxy + handler: python + options: + show_source: false + annotations_path: brief + show_signature: true + separate_signature: true + show_signature_annotations: false \ No newline at end of file diff --git a/docs/assets/Thumbs.db b/docs/assets/Thumbs.db new file mode 100644 index 0000000..df716b1 Binary files /dev/null and b/docs/assets/Thumbs.db differ diff --git a/docs/assets/usegalaxy_eu_00.jpg b/docs/assets/usegalaxy_eu_00.jpg new file mode 100644 index 0000000..9efdbe6 Binary files /dev/null and b/docs/assets/usegalaxy_eu_00.jpg differ diff --git a/docs/assets/usegalaxy_eu_01.jpg b/docs/assets/usegalaxy_eu_01.jpg new file mode 100644 index 0000000..094bbff Binary files /dev/null and b/docs/assets/usegalaxy_eu_01.jpg differ diff --git a/docs/assets/usegalaxy_eu_02.jpg b/docs/assets/usegalaxy_eu_02.jpg new file mode 100644 index 0000000..4ce993f Binary files /dev/null and b/docs/assets/usegalaxy_eu_02.jpg differ diff --git a/docs/assets/usegalaxy_eu_03.jpg b/docs/assets/usegalaxy_eu_03.jpg new file mode 100644 index 0000000..103b784 Binary files /dev/null and b/docs/assets/usegalaxy_eu_03.jpg differ diff --git a/docs/assets/usegalaxy_eu_04.jpg b/docs/assets/usegalaxy_eu_04.jpg new file mode 100644 index 0000000..3b63238 Binary files /dev/null and b/docs/assets/usegalaxy_eu_04.jpg differ diff --git a/docs/assets/usegalaxy_eu_05.jpg b/docs/assets/usegalaxy_eu_05.jpg new file mode 100644 index 0000000..85deb68 Binary files /dev/null and b/docs/assets/usegalaxy_eu_05.jpg differ diff --git a/docs/cli.md b/docs/cli.md index 0c4947f..a91be74 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -4,4 +4,6 @@ This page provides documentation for our command line tools. ::: mkdocs-click :module: blast2galaxy.cli - :command: cli \ No newline at end of file + :command: cli + :style: plain + :list_subcommands: True \ No newline at end of file diff --git a/docs/configuration.md b/docs/configuration.md index 0edcdf5..768c54c 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,5 +1,17 @@ # Configuration +blast2galaxy provides two ways to configure servers and profiles to be used with blast2galaxy. + +- **TOML file based configuration**
+ This type of configuration can be used when using the CLI or Python API of blast2galaxy. + +- **Python API based configuration**
+ This type of configuration can only be used when using the Python API of blast2galaxy. + +Both ways of configuration are described in the following sections of this page. + +## TOML file based configuration + To connect to an API of an existing Galaxy server, the user of blast2galaxy has to provide API access credentials in the form of a Galaxy API key. Furthermore, it is possible to connect via the username and password of the user account on the specific Galaxy server. For security reasons, the latter variant should only be used if the use of an API key is not possible. @@ -10,12 +22,16 @@ If it can't find a configuration file in the current working directory it looks If it can't find any configuration file an error message will be displayed. An individually named configuration file at a storage location of your choice can be set via the `--configfile=PATH` parameter of the CLI. -Example: `--configfile=/opt/myapps/config/app1.blast2galaxy.config.toml` + +Example: +``` +--configfile=/opt/myapps/config/app1.blast2galaxy.config.toml +```
-

General Structure of the config TOML

+### General Structure of the config TOML file The configuration file has two types of sections: @@ -28,7 +44,7 @@ Where `###` has to be replaced with either `default` or a unique server-ID / pro !!! note A config file must consist at least one `[servers.]` entry and one `[profiles.]` entry. - If you provide `[servers.default]` and `[profiles.default]` the `--profile` parameter of the CLI can be omitted. + If you provide `[servers.default]` and `[profiles.default]` the `--server=` and `--profile=` parameter of the CLI commands can be omitted. @@ -38,7 +54,7 @@ Where `###` has to be replaced with either `default` or a unique server-ID / pro
-

Servers

+### Servers The servers section holds one or multiple Galaxy server instances with their corresponding URLs and API-Keys. @@ -52,28 +68,28 @@ api_key = "65dcb*******************************" !!! tip - After configuration of at least one default server you can use the `list-tools` command of the CLI to get a table with all NCBI BLAST+ tools and DIAMOND available on that Galaxy server. The table also contains the Tool-IDs for configuration of the profiles and also all available sequences databases corresponding to the specific tool. + After configuration of at least one default server you can use the `list-tools` command of the CLI to get a table with all compatible NCBI BLAST+ tools and DIAMOND available on that Galaxy server. The table also contains the Tool-IDs for configuration of the profiles. List all available tools of the default server: ```bash blast2galaxy list-tools ``` - List all available tools of the server with ID server_id: + List all available tools of the server with ID `SERVER_ID`: ```bash - blast2galaxy list-tools --server=server_id + blast2galaxy list-tools --server=SERVER_ID ```
-

Profiles

+### Profiles The profiles section holds one or multiple profiles where each profile configures at least the Galaxy server and the Tool-ID to be used. Mandatory fields for each profile are: - `server`   *An ID of an configured server in the servers section* -- `tool_id`   *Tool-ID for the tool on the Galaxy server* +- `tool`   *Tool-ID for the tool on the Galaxy server* Optional fields for each profile are: @@ -84,10 +100,22 @@ Example: ```toml [profiles.default] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" +tool = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" ``` +!!! tip + + After configuration of at least one default profile you can use the `list-dbs` command of the CLI to get a table with all available sequence databases for a specific tool. + + List all available databases of the tool with ID `TOOL_ID` on the default server: + ```bash + blast2galaxy list-tools --tool=TOOL_ID + ``` + List all available databases of the tool with ID `TOOL_ID` on the server with ID `SERVER_ID`: + ```bash + blast2galaxy list-tools --server=SERVER_ID --tool=TOOL_ID + ``` @@ -105,24 +133,75 @@ api_key = "1k32z*******************************" [profiles.default] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" +tool = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" [profiles.blastp] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.14.1+galaxy2" +tool = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.14.1+galaxy2" [profiles.blastp_plantae_genes] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.14.1+galaxy2" +tool = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.14.1+galaxy2" database = "plant_proteins" [profiles.diamond_blastp_plantae_genes] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0" +tool = "toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0" database = "plant_proteins" [profiles.blastn_vertebrata] server = "default" -tool_id = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" +tool = "toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/2.14.1+galaxy2" database = "vertebrata_proteins" +``` + +## Python API based configuration + +If you use the Python API of blast2galaxy it is also possible to provide the configuration programmatically without the need for an `.blast2galaxy.toml` file. + +Example for API based configuration during runtime for setting a default server and a default profile: +```python +import blast2galaxy + +blast2galaxy.config.add_default_server( + server_url = 'https://usegalaxy.eu', + api_key = 'your_api_key' +) + +blast2galaxy.config.add_default_profile( + server = 'default', + tool = 'toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.14.1+galaxy2' +) + +blast2galaxy.blastp( + query = 'prot.fasta', + db = 'database_id', + out = 'result_blastp.txt', + outfmt = '6' +) +``` + +If you want to add further servers and profiles beside the defaults you can use `blast2galaxy.config.add_server()` and `blast2galaxy.config.add_profile()`: +```python +import blast2galaxy + +blast2galaxy.config.add_server( + server = 'myserver', + server_url = 'https://usegalaxy.eu', + api_key = 'your_api_key' +) + +blast2galaxy.config.add_profile( + profile = 'diamond', + server = 'myserver', + tool = 'toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0' +) + +blast2galaxy.diamond( + profile = 'diamond', + query = 'prot.fasta', + db = 'database_id', + out = 'result_diamond_blastp.txt', + outfmt = '6' +) ``` \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 9bc3187..fc3fd70 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,16 +1,20 @@ -# Welcome to blast2galaxy +# Welcome to blast2galaxy documentation blast2galaxy provides a Python API and CLI to perform BLAST and DIAMOND queries against Galaxy servers that have the NCBI BLAST+ tools [[1]](#1) and DIAMOND [[2]](#2) installed. +blast2galaxy is available as a PyPI package (pip) and tested to be working with the following Python versions: 3.10, 3.11, 3.12 + ![Screenshot](figure_1_v3.png) -*Figure 1: Blast2galaxy provides a high-level convenience layer between any Galaxy server with NCBI BLAST+ tools and/or DIAMOND installed and different types of clients and usage scenarios. Researchers, applications and computational pipelines can either use the Python-API or the CLI of blast2galaxy to send requests for a BLAST search to any compatible Galaxy server. The use of BLAST+ tools and/or DIAMOND by multiple applications and the provision of corresponding BLAST databases can be centralized and made reusable by use of a Galaxy server.* +*Figure 1: blast2galaxy provides a high-level convenience layer between any Galaxy server with NCBI BLAST+ tools and/or DIAMOND installed and different types of clients and usage scenarios. Researchers, applications and computational pipelines can either use the Python-API or the CLI of blast2galaxy to send requests for a BLAST search to any compatible Galaxy server. The use of BLAST+ tools and/or DIAMOND by multiple applications and the provision of corresponding BLAST databases can be centralized and made reusable by use of a Galaxy server.* + Please read the [CLI Reference](cli.md) if you want to use blast2galaxy on the command line as drop-in replacement for NCBI BLAST+ tools or DIAMOND or read the [API Reference](api.md) if you want to use blast2galaxy inside a Python application to perform BLAST or DIAMOND queries against a Galaxy server. +

References

[1] Peter J. A. Cock, John M. Chilton, Björn Grüning, James E. Johnson, Nicola Soranzo, NCBI BLAST+ integrated into Galaxy, GigaScience, Volume 4, Issue 1, December 2015, s13742-015-0080-7, [https://doi.org/10.1186/s13742-015-0080-7](https://doi.org/10.1186/s13742-015-0080-7) diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 index 0000000..1cb8634 --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,150 @@ +# Tutorial + +This tutorial described all process from installation to performing a BLAST search on the usegalaxy.eu instance. + +## 1. Obtaining API key from usegalaxy.eu + +1. Login into your account at [usegalaxy.eu](https://usegalaxy.eu/). If you don't have an account yet you can create one using LifeScience Login or the registration. +

+ +2. After login into your account: In the main menu at the top of the page: click on `User` and in the appearing sub menu on `Preferences`: ![Galaxy User Settings](assets/usegalaxy_eu_00.jpg) +

+ +3. On the then appearing `User Preferences` page click on `Manage API Key`: + ![Galaxy User Preferences](assets/usegalaxy_eu_01.jpg) +

+ +4. On the then appearing `Manage API Key` page click on the button `Create a new key`: + ![Galaxy Manage API Key](assets/usegalaxy_eu_02.jpg) +

+ +4. Click on the `Copy Key` button to copy your newly created API key into your clipboard. + ![Galaxy Manage API Key](assets/usegalaxy_eu_03.jpg) + +5. Please paste the API key into an editor for later usage. + + +## 2. Installation of blast2galaxy + +Install blast2galaxy by using the following command. + +!!! note + It is highly recommended to install blast2galaxy in an isolated environment created with an environment management tool like conda/mamba, pixi, virtualenv or similar. + +``` +pip install blast2galaxy +``` + +After installation you can check if the blast2galaxy CLI is available by executing the following command: + +``` +blast2galaxy --help +``` + +You should then see some help information about the blast2galaxy CLI: + +``` +Usage: blast2galaxy [OPTIONS] COMMAND [ARGS]... + + Main entrypoint. + +Options: + --help Show this message and exit. + +Commands: + blastn + blastp + blastx + diamond-blastp + diamond-blastx + list-tools + show-config + tblastn +``` + +## 3. Configuration of blast2galaxy + +To configure blast2galaxy it is needed to create a config file named `.blast2galaxy.toml` in your home directory or in the current working directory where you execute blast2galaxy. + +Lets create a config file in your home directory: + +``` +touch ~/.blast2galaxy.toml +``` + +You can now add the usegalaxy.eu Server as default server by pasting the following content to the config file. +Please replace `PASTE_YOUR_API_KEY_HERE` with the API key you created and stored previously. + +``` +[servers.default] +server_url = "https://usegalaxy.eu" +api_key = "PASTE_YOUR_API_KEY_HERE" +``` + +After this initial configuration you can use the following command to get a list of compatible BLAST and DIAMOND tools available on usegalaxy.eu: + +``` +blast2galaxy list-tools +``` + +The output of this command should look similar to this: + +![BLAST and DIAMOND tools on usegalaxy.eu](assets/usegalaxy_eu_04.jpg) + +You can use this information to configure a default profile for blast2galaxy. + +To do this add the following content to your existing `.blast2galaxy.toml` config file: + +``` +[profiles.default] +server = "default" +tool = "toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0" +``` + +## 4. Obtaining available sequence databases for a BLAST or DIAMOND tool + +Once you have configured a default server in the `.blast2galaxy.toml` config file you can get a list of available sequence databases for a specific tool. + +For this tutorial we want to list all databases for the DIAMOND tool with the ID `toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0`: +``` +blast2galaxy list-dbs --tool=toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0 +``` + +The output of this command should look similar to this: + +![BLAST and DIAMOND tools on usegalaxy.eu](assets/usegalaxy_eu_05.jpg) + +For the example search request in step 5 we will use the database with ID `uniprot_swissprot_2023_03`. + + +## 5. Executing search requests + +In the previous steps you have configured a default server and a default profile for blast2galaxy. +The default profile points to the DIAMOND tool on usegalaxy.eu. +Therefore you can now execute DIAMOND searches on usegalaxy.eu: + +1. Create an example FASTA file called `query_protein.fasta` with a protein sequence to be searched with DIAMOND on usegalaxy.eu: + ``` + >sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=H4C1 PE=1 SV=2 + MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK + VFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG + ``` + +2. Execute DIAMOND search with blast2galaxy: + ``` + blast2galaxy diamond-blastp --query=query_protein.fasta --db=uniprot_swissprot_2023_03 --out=result_diamond_query_protein.txt --outfmt=6 + ``` + +3. Check the result of the DIAMOND search: + ``` + less result_diamond_query_protein.txt + ``` + The result file should contain a search result similar to the following content: + ``` + sp|P62805|H4_HUMAN sp|P62803|H4_BOVIN 100 103 0 0 1 103 1 103 5.60e-66 196 + sp|P62805|H4_HUMAN sp|P62800|H4_CAIMO 100 103 0 0 1 103 1 103 5.60e-66 196 + sp|P62805|H4_HUMAN sp|Q7KQD1|H4_CHAVR 100 103 0 0 1 103 1 103 5.60e-66 196 + sp|P62805|H4_HUMAN sp|P62801|H4_CHICK 100 103 0 0 1 103 1 103 5.60e-66 196 + sp|P62805|H4_HUMAN sp|Q4R362|H4_MACFA 100 103 0 0 1 103 1 103 5.60e-66 196 + ... + ``` \ No newline at end of file diff --git a/docs/usage.md b/docs/usage.md index 01f0cb6..fb53bd1 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -5,7 +5,7 @@ After installation of blast2galaxy you can use the `blast2galaxy` CLI to perform BLAST and DIAMOND searches against the Galaxy servers you have configured in the `.blast2galaxy.config.toml` file -`blast2galaxy blastn --help` + You can find all possible subcommands and parameters in the [CLI reference](cli.md). @@ -13,13 +13,36 @@ You can find all possible subcommands and parameters in the [CLI reference](cli. -### List available tools and sequence database of a Galaxy server +### List available and compatible BLAST+ and DIAMOND tools of a Galaxy server -After configuration of at least one default server you can use the `list-tools` command of the CLI to get a table with all NCBI BLAST+ tools and DIAMOND available on that Galaxy server. The table also contains the Tool-IDs for configuration of the blast2galaxy profiles and also all available sequences databases corresponding to the specific tool. +After configuration of at least one default server you can use the `list-tools` command of the CLI to get a table with all NCBI BLAST+ tools and DIAMOND available on that Galaxy server. The table also contains the Tool-IDs for configuration of the blast2galaxy profiles. -List all available tools of the default server: `blast2galaxy list-tools` +List all available tools of the default server: +```shell +blast2galaxy list-tools +``` + +List all available tools of the server with the ID `SERVER_ID`: +``` +blast2galaxy list-tools --server=SERVER_ID +``` + + +### List available tools and sequence databases of a Galaxy server -List all available tools of the server with ID server_id: `blast2galaxy list-tools --server=server_id` +After configuration of at least one default profile you can use the `list-dbs` command of the CLI to get a table with all available sequence databases for a specific tool. + +List all available databases of the tool with ID `TOOL_ID` on the default server: +```bash +blast2galaxy list-tools --tool=TOOL_ID +``` + +If you have configured multiple servers in the config file `.blast2galaxy.toml`, you can also obtain the available databases for a tool on a specific server other than the default server. + +List all available databases of the tool with ID `TOOL_ID` on the server with ID `SERVER_ID`: +```bash +blast2galaxy list-tools --server=SERVER_ID --tool=TOOL_ID +``` ### Perform search requests diff --git a/mkdocs.yml b/mkdocs.yml index f9bf7d4..bf4b0e3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,7 +1,5 @@ site_name: blast2galaxy Documentation - -site_url: https://example.com/ - +repo_name: blast2galaxy repo_url: https://github.com/IPK-BIT/blast2galaxy edit_uri: edit/main/docs/ @@ -10,6 +8,7 @@ nav: - Installation: installation.md - Configuration: configuration.md - Usage: usage.md + - Tutorial: tutorial.md - CLI Reference: cli.md - API Reference: api.md @@ -17,12 +16,18 @@ theme: name: material icon: repo: fontawesome/brands/github + features: + - content.code.annotate + - content.code.copy + - content.tabs.link + - search.highlight plugins: - mkdocstrings markdown_extensions: + - attr_list - mkdocs-click - admonition - pymdownx.details diff --git a/pyproject.toml b/pyproject.toml index 0c52c69..32dc35e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "blast2galaxy" -version = "0.1.0" +version = "0.1.0a1" keywords = ["bioinformatics", "blast", "sequence alignment"] description = "" authors = ["Patrick König "] @@ -34,3 +34,38 @@ blast2galaxy = "blast2galaxy.cli:cli" [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api" + +[project] +name = "blast2galaxy" +description = "A Python package with a CLI and API to perform BLAST queries against Galaxy servers" +requires-python = ">=3.10" +license = "MIT" +authors = [ + {name = "Patrick König"}, +] +maintainers = [ + {name = "Patrick König", email = "koenig@ipk-gatersleben.de"}, +] +readme = "README.md" + +classifiers = [ + "Development Status :: 5 - Production/Stable", + "Intended Audience :: Developers", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: MIT License", + "Natural Language :: English", + "Operating System :: MacOS :: MacOS X", + "Operating System :: Microsoft :: Windows", + "Operating System :: POSIX :: Linux", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: Implementation :: CPython", + "Topic :: Scientific/Engineering :: Bio-Informatics", +] + +[project.urls] +Documentation = "https://ipk-bit.github.io/blast2galaxy/" +Source = "https://github.com/IPK-BIT/blast2galaxy/" +Homepage = "https://github.com/IPK-BIT/blast2galaxy/" \ No newline at end of file diff --git a/src/blast2galaxy/__init__.py b/src/blast2galaxy/__init__.py index 9807b7c..a1bcc0f 100644 --- a/src/blast2galaxy/__init__.py +++ b/src/blast2galaxy/__init__.py @@ -1,5 +1,9 @@ +from typing import Optional +from typing_extensions import Annotated + import click +from .api.choices import ChoicesBlastType, ChoicesTaskBlastn, ChoicesTaskTblastn, ChoicesTaskBlastp, ChoicesTaskBlastx, ChoicesOutfmt, ChoicesOutfmtDiamond, ChoicesYesNo, ChoicesStrand from . import cli @@ -23,25 +27,420 @@ def __check_required_but_missing_params(_method, _kwargs): def __invoke(cli_method, _kwargs): __check_required_but_missing_params(cli_method, _kwargs) ctx = click.Context(cli_method) - ctx.invoke(cli_method, **_kwargs) + return ctx.invoke(cli_method, **_kwargs) + + + + + +def list_tools( + server: Optional[str] = 'default', + type: Optional[ChoicesBlastType | None] = None, + ): + """ + list_tools + + list available and compatible BLAST+ and DIAMOND tools installed on a Galaxy server + + Arguments: + server: Server-ID + type: limit the list to a specific tool type (blastn, tblast, blastp, blastx, diamond) + """ + params = locals() + params['calltype'] = 'api' + return __invoke(cli.list_tools, params) + + + +def list_dbs( + tool: str, + server: Optional[str] = 'default' + ): + """ + list_dbs + + list available databases of a BLAST+ or DIAMOND tool installed on a Galaxy server + + Arguments: + server: Server-ID + tool: Tool-ID + """ + params = locals() + params['calltype'] = 'api' + return __invoke(cli.list_dbs, params) + + + +#def blastn(**kwargs): +# __invoke(cli.blastn, kwargs) + + +def blastn( + profile: Optional[str] = '', + query: str = '', + task: Optional[ChoicesTaskBlastn] = ChoicesTaskBlastn.megablast, + db: Optional[str | None] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmt] = ChoicesOutfmt.tab_std.value, + html: Optional[bool] = False, + dust: Optional[ChoicesYesNo] = ChoicesYesNo.yes.value, + strand: Optional[ChoicesStrand] = ChoicesStrand.both.value, + max_hsps: Optional[int | None] = None, + perc_identity: Optional[float] = 0.0, + word_size: Optional[int | None] = None, + ungapped: Optional[bool] = False, + parse_deflines: Optional[bool] = False, + qcov_hsp_perc: Optional[float] = 0.0, + window_size: Optional[int | None] = None, + gapopen: Optional[int | None] = None, + gapextend: Optional[int | None] = None + ): + """ + blastn + + search nucleotide databases using a nucleotide query + + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + dust: Filter out low complexity regions (with DUST) + strand: Query strand(s) to search against database/subject + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + perc_identity: Percent identity cutoff + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + """ + + params = locals() + params['calltype'] = 'api' + __invoke(cli.blastn, params) + + + +def tblastn( + profile: str = '', + query: str = '', + task: Optional[ChoicesTaskTblastn] = ChoicesTaskTblastn.tblastn, + db: Optional[str] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmt] = ChoicesOutfmt.tab_std.value, + html: Optional[bool] = False, + seg: Optional[ChoicesYesNo] = ChoicesYesNo.yes.value, + db_gencode: Optional[int] = None, + matrix: Optional[str] = None, + max_target_seqs: Optional[int] = None, + num_descriptions: Optional[int] = None, + num_alignments: Optional[int] = None, + threshold: Optional[float] = None, + max_hsps: Optional[int] = None, + word_size: Optional[int] = None, + ungapped: Optional[bool] = False, + parse_deflines: Optional[bool] = False, + qcov_hsp_perc: Optional[float] = 0.0, + window_size: Optional[int] = None, + gapopen: Optional[int] = None, + gapextend: Optional[int] = None, + comp_based_stats: Optional[str] = '2', + ): + """ + tblastn + + search translated nucleotide databases using a protein query + + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + db_gencode: Genetic code to use to translate database/subjects (see user manual for details) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ + + params = locals() + params['calltype'] = 'api' + __invoke(cli.tblastn, params) + + + + +def blastp( + profile: str = '', + query: str = '', + task: Optional[ChoicesTaskBlastp] = ChoicesTaskBlastp.blastp, + db: Optional[str] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmt] = ChoicesOutfmt.tab_std.value, + html: Optional[bool] = False, + seg: Optional[ChoicesYesNo] = ChoicesYesNo.yes.value, + matrix: Optional[str] = None, + max_target_seqs: Optional[int] = None, + num_descriptions: Optional[int] = None, + num_alignments: Optional[int] = None, + threshold: Optional[float] = None, + max_hsps: Optional[int] = None, + word_size: Optional[int] = None, + ungapped: Optional[bool] = False, + parse_deflines: Optional[bool] = False, + qcov_hsp_perc: Optional[float] = 0.0, + window_size: Optional[int] = None, + gapopen: Optional[int] = None, + gapextend: Optional[int] = None, + comp_based_stats: Optional[str] = '2', + use_sw_tback: Optional[bool] = False + ): + """ + blastp + + search protein databases using a protein query + + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + use_sw_tback: Compute locally optimal Smith-Waterman alignments? + """ + + params = locals() + params['calltype'] = 'api' + __invoke(cli.blastp, params) + + + +def blastx( + profile: str = '', + query: str = '', + task: Optional[ChoicesTaskBlastx] = ChoicesTaskBlastx.blastx, + db: Optional[str] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmt] = ChoicesOutfmt.tab_std.value, + html: Optional[bool] = False, + seg: Optional[ChoicesYesNo] = ChoicesYesNo.yes.value, + matrix: Optional[str] = None, + max_target_seqs: Optional[int] = None, + num_descriptions: Optional[int] = None, + num_alignments: Optional[int] = None, + threshold: Optional[float] = None, + max_hsps: Optional[int] = None, + word_size: Optional[int] = None, + ungapped: Optional[bool] = False, + parse_deflines: Optional[bool] = False, + qcov_hsp_perc: Optional[float] = 0.0, + window_size: Optional[int] = None, + gapopen: Optional[int] = None, + gapextend: Optional[int] = None, + comp_based_stats: Optional[str] = '2', + ): + """ + blastx + + search protein databases using a translated nucleotide query + + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ + + params = locals() + params['calltype'] = 'api' + __invoke(cli.blastx, params) + + + +def diamond_blastp( + profile: str = '', + query: str = '', + task: Optional[ChoicesTaskBlastp] = ChoicesTaskBlastp.blastp, + db: Optional[str] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmtDiamond] = ChoicesOutfmtDiamond.blast_pairwise.value, + faster: Optional[bool] = False, + fast: Optional[bool] = False, + mid_sensitive: Optional[bool] = False, + sensitive: Optional[bool] = False, + more_sensitive: Optional[bool] = False, + very_sensitive: Optional[bool] = False, + ultra_sensitive: Optional[bool] = False, + strand: Optional[ChoicesStrand] = ChoicesStrand.both.value, + matrix: Optional[str] = 'BLOSUM62', + max_target_seqs: Optional[int] = None, + max_hsps: Optional[int] = None, + window: Optional[int] = None, + gapopen: Optional[int] = None, + gapextend: Optional[int] = None, + comp_based_stats: Optional[str] = '1', + ): + """ + diamond_blastp + search protein databases using a protein query with DIAMOND + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + faster: faster mode + fast: fast mode + mid_sensitive: mid_sensitive mode + sensitive: sensitive mode + more_sensitive: more_sensitive mode + very_sensitive: very_sensitive mode + ultra_sensitive: ultra_sensitive mode + strand: Query strand(s) to search against database/subject + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + window: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ + params = locals() + params['calltype'] = 'api' + __invoke(cli.diamond_blastp, params) -def list_tools(**kwargs): - __invoke(cli.list_tools, kwargs) -def blastn(**kwargs): - __invoke(cli.blastn, kwargs) +def diamond_blastx( + profile: str = '', + query: str = '', + task: Optional[ChoicesTaskBlastp] = ChoicesTaskBlastp.blastp, + db: Optional[str] = None, + evalue: Optional[str] = '0.001', + out: str = '', + outfmt: Optional[ChoicesOutfmtDiamond] = ChoicesOutfmtDiamond.blast_pairwise.value, + faster: Optional[bool] = False, + fast: Optional[bool] = False, + mid_sensitive: Optional[bool] = False, + sensitive: Optional[bool] = False, + more_sensitive: Optional[bool] = False, + very_sensitive: Optional[bool] = False, + ultra_sensitive: Optional[bool] = False, + strand: Optional[ChoicesStrand] = ChoicesStrand.both.value, + matrix: Optional[str] = 'BLOSUM62', + max_target_seqs: Optional[int] = None, + max_hsps: Optional[int] = None, + window: Optional[int] = None, + gapopen: Optional[int] = None, + gapextend: Optional[int] = None, + comp_based_stats: Optional[str] = '1', + ): + """ + diamond_blastx -def tblastn(**kwargs): - __invoke(cli.tblastn, kwargs) + search protein databases using a translated nucleotide query with DIAMOND + + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + faster: faster mode + fast: fast mode + mid_sensitive: mid_sensitive mode + sensitive: sensitive mode + more_sensitive: more_sensitive mode + very_sensitive: very_sensitive mode + ultra_sensitive: ultra_sensitive mode + strand: Query strand(s) to search against database/subject + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + window: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ + params = locals() + params['calltype'] = 'api' + __invoke(cli.diamond_blastx, params) -__all__ = ['blastn'] \ No newline at end of file +__all__ = ['list_tools', 'list_dbs', 'blastn', 'tblastn', 'blastp', 'blastx', 'diamond_blastp', 'diamond_blastx'] \ No newline at end of file diff --git a/src/blast2galaxy/api/blast_request.py b/src/blast2galaxy/api/blast_request.py index 24fa525..0705116 100644 --- a/src/blast2galaxy/api/blast_request.py +++ b/src/blast2galaxy/api/blast_request.py @@ -335,7 +335,7 @@ def request(params): tool_inputs_dict = tool_inputs.to_dict() print('='*150) print(json.dumps(tool_inputs_dict, indent=4)) - print('+ tool_id = ', profile['tool_id']) + print('+ tool_id = ', profile['tool']) print('+ history_id = ', history_id) print('='*150) @@ -344,7 +344,7 @@ def request(params): run_tool_result = gi.tools.run_tool( history_id = history_id, - tool_id = str(profile['tool_id']), + tool_id = str(profile['tool']), tool_inputs = tool_inputs ) diff --git a/src/blast2galaxy/api/server_info.py b/src/blast2galaxy/api/server_info.py index 1c74d1d..1a263d3 100644 --- a/src/blast2galaxy/api/server_info.py +++ b/src/blast2galaxy/api/server_info.py @@ -8,7 +8,7 @@ def get_available_tools_and_databases(server = 'default', blast_type = None): gi = config.get_galaxy_instance(server=server) blast_tool_ids = [] - blast_tools_databases = [] + #blast_tools_databases = [] blast_tools_databases_dict = {} tool_id_pattern_to_tool_type = { @@ -19,6 +19,14 @@ def get_available_tools_and_databases(server = 'default', blast_type = None): 'bg_diamond/': 'diamond' } + compatible_versions = { + 'blastn': ['2.10.1+galaxy0', '2.14.1+galaxy0', '2.14.1+galaxy1'], + 'tblastn': ['2.10.1+galaxy0', '2.14.1+galaxy0', '2.14.1+galaxy1'], + 'blastx': ['2.10.1+galaxy0', '2.14.1+galaxy0', '2.14.1+galaxy1'], + 'blastp': ['2.10.1+galaxy0', '2.14.1+galaxy0', '2.14.1+galaxy1'], + 'diamond': ['2.0.15+galaxy0'] + } + tool_name_by_tool_id = {} if blast_type: @@ -32,30 +40,34 @@ def get_available_tools_and_databases(server = 'default', blast_type = None): blast_tool_ids_to_match = tool_id_pattern_to_tool_type.keys() tools = gi.tools.get_tools() + for tool in tools: matches = [x in tool['id'] for x in blast_tool_ids_to_match] if any(matches): - blast_tool_ids.append(tool['id']) which_tool = list(compress(blast_tool_ids_to_match, matches))[0] - tool_name_by_tool_id[tool['id']] = tool_id_pattern_to_tool_type[which_tool] + tool_name = tool_id_pattern_to_tool_type[which_tool] + tool_name_by_tool_id[tool['id']] = tool_name + + if tool['version'] in compatible_versions[tool_name]: + blast_tool_ids.append(tool['id']) + + #if tool['id'] == 'ncbi_tblastn_wrapper_faba': + # print(json.dumps(tool, indent=2)) for blast_tool_id in blast_tool_ids: - blast_tool_io_details = gi.tools.show_tool(blast_tool_id, io_details=True) + blast_tool_details = gi.tools.show_tool(blast_tool_id, io_details=True) blast_tool_databases = {} - blast_tool_types = {} - for _input in blast_tool_io_details['inputs']: + #if blast_tool_id == 'ncbi_tblastn_wrapper_faba': + # print(json.dumps(blast_tool_details, indent=2)) - if _input['name'] == 'blast_type': - for _option in _input['options']: - blast_tool_types[_option[1]] = _option[0] + for _input in blast_tool_details['inputs']: # NCBI BLAST+ if _input['name'] == 'db_opts': for _case in _input['cases']: if _case['value'] == 'db': for __input in _case['inputs']: - #print(__input) if __input['name'] == 'database': for _database in __input['options']: blast_tool_databases[_database[1]] = _database[0] @@ -65,17 +77,15 @@ def get_available_tools_and_databases(server = 'default', blast_type = None): for _case in _input['cases']: if _case['value'] == 'indexed': for __input in _case['inputs']: - #print(__input) if __input['name'] == 'index': for _database in __input['options']: blast_tool_databases[_database[1]] = _database[0] - blast_tool_entry = {'blast_tool_id': blast_tool_id, 'available_databases': blast_tool_databases, 'available_blast_types': blast_tool_types} - blast_tools_databases.append(blast_tool_entry) + blast_tools_databases_dict[blast_tool_id] = { 'tool_name': tool_name_by_tool_id[blast_tool_id], - 'available_databases': blast_tool_databases.keys(), - 'available_blast_types': blast_tool_types.keys() + 'version': blast_tool_details['version'], + 'available_databases': blast_tool_databases } - return blast_tool_ids, blast_tools_databases, blast_tools_databases_dict \ No newline at end of file + return blast_tools_databases_dict \ No newline at end of file diff --git a/src/blast2galaxy/cli.py b/src/blast2galaxy/cli.py index 5d053d9..9c86ad0 100644 --- a/src/blast2galaxy/cli.py +++ b/src/blast2galaxy/cli.py @@ -2,6 +2,10 @@ from typing import Optional from typing_extensions import Annotated +from rich.console import Console +from rich.table import Table +from rich import box + from .api import blast_request as api from .api.choices import ChoicesBlastType, ChoicesTaskBlastn, ChoicesTaskTblastn, ChoicesTaskBlastp, ChoicesTaskBlastx, ChoicesOutfmt, ChoicesOutfmtDiamond, ChoicesYesNo, ChoicesStrand @@ -15,15 +19,7 @@ @click.group(name='blast2galaxy') def cli(): - """Main entrypoint.""" - - - - -#@cli.command() -#@click.option("-d", "--debug", help="Include debug output.") -#def build(debug): -# """Build production assets.""" + pass @@ -31,15 +27,14 @@ def cli(): @cli.command() def show_config(): - from rich.console import Console - from rich.table import Table - from rich import box + """ + Show information about the currently available configuration loaded from a .blast2galaxy.toml file + """ from . import config try: - config_toml_path = config.get_config_toml_path() - config = config.load_config_toml() + config, config_toml_path = config.load_config_toml() table = Table(show_lines=True, box=box.SQUARE) table.add_column('Server ID', justify='left', style='white', no_wrap=True) @@ -60,7 +55,7 @@ def show_config(): table.add_column('Tool ID', justify='left', style='white') for profile_id, profile_config in config['profiles'].items(): - table.add_row(profile_id, profile_config['server'], profile_config['tool_id']) + table.add_row(profile_id, profile_config['server'], profile_config['tool']) console = Console() console.print('\n[underline]Configured profiles:') @@ -74,71 +69,92 @@ def show_config(): -# @cli.command() -# def test_rich(): -# from rich.console import Console -# from rich.table import Table - -# table = Table() - -# table.add_column("Released", justify="right", style="cyan", no_wrap=True) -# table.add_column("Title", style="magenta") -# table.add_column("Box Office", justify="right", style="green") - -# table.add_row("Dec 20, 2019", "Star Wars: The Rise of Skywalker", "$952,110,690") -# table.add_row("May 25, 2018", "Solo: A Star Wars Story", "$393,151,347") -# table.add_row("Dec 15, 2017", "Star Wars Ep. V111: The Last Jedi", "$1,332,539,889") -# table.add_row("Dec 16, 2016", "Rogue One: A Star Wars Story", "$1,332,439,889") - -# console = Console() -# console.print(table) - - - - -# @cli.command() -# def test_config(): -# from . import config -# test = config.get_profile(profile='univec') -# print(test) - - - - - @cli.command() @click.option('--server', help='Server-ID as in your config TOML', type=str, default='default', show_default=True) -@click.option('--type', help='Type of BLAST search', type=click.Choice(ChoicesBlastType, case_sensitive=False)) +@click.option('--type', help='Type of BLAST search', type=click.Choice(ChoicesBlastType, case_sensitive=False), default=None) def list_tools( server: str = '', type: Optional[ChoicesBlastType | None] = None, + **kwargs ): - from rich.console import Console - from rich.table import Table - from rich import box + """ + list available and compatible BLAST+ and DIAMOND tools installed on a Galaxy server + """ + + #print('==============================') + #print(server) + #print(type) + #print(kwargs) + #exit() - blast_tool_ids, blast_tools_databases, blast_tools_databases_dict = server_info.get_available_tools_and_databases( + blast_tools_databases_dict = server_info.get_available_tools_and_databases( server = server, blast_type = get_value(type) ) + if 'calltype' in kwargs and kwargs['calltype'] == 'api': + return blast_tools_databases_dict + table = Table(show_lines=True, box=box.SQUARE) # MINIMAL_DOUBLE_HEAD SQUARE table.add_column('Tool', justify='left', style='white', no_wrap=True) table.add_column('Tool ID', justify='left', style='white', no_wrap=True) - table.add_column('Available databases', justify='left', style='white') + table.add_column('Tool Version', justify='left', style='white') for tool_id, tool_specs in blast_tools_databases_dict.items(): - dbs = ', '.join(list(tool_specs['available_databases'])) - table.add_row(tool_specs['tool_name'], tool_id, dbs) + table.add_row(tool_specs['tool_name'], tool_id, tool_specs['version']) console = Console() console.print('\n[underline]Available BLAST tools and corresponding databases:\n') console.print(table) + # print('+++ TEST +++') + # #global config + # print(config) + # print(config.get_conf()) + # print('///////////////') + # print(conf.config) +@cli.command() +@click.option('--server', help='Server-ID as in your config TOML', type=str, default='default', show_default=True) +@click.option('--tool', help='Tool-ID of a tool available on the Galaxy server', type=str, required=True) +def list_dbs( + server: str = '', + tool: str = '', + **kwargs + ): + """ + list available databases of a BLAST+ or DIAMOND tool installed on a Galaxy server + """ + + tool_id = tool + + blast_tools_databases_dict = server_info.get_available_tools_and_databases(server = server) + + if tool_id in blast_tools_databases_dict: + + if 'calltype' in kwargs and kwargs['calltype'] == 'api': + return blast_tools_databases_dict[tool_id]['available_databases'] + + table = Table(show_lines=True, box=box.SQUARE) # MINIMAL_DOUBLE_HEAD SQUARE + table.add_column('Database ID', justify='left', style='white', no_wrap=True) + table.add_column('Database Description', justify='left', style='white', no_wrap=True) + + for db_id, db_desc in blast_tools_databases_dict[tool_id]['available_databases'].items(): + table.add_row(db_id, db_desc) + + console = Console() + console.print(f'\n[underline]Available databases for tool with ID `{tool_id}`:\n') + console.print(table) + + else: + console = Console() + console.print(f'\n[red]ERROR: A tool with ID `{tool_id}` does not exist on the Galaxy server `{server}`.\n') + + + @cli.command() @@ -162,7 +178,7 @@ def list_tools( @click.option('--gapopen', help=HELP.gapopen, type=click.IntRange(0)) @click.option('--gapextend', help=HELP.gapextend, type=click.IntRange(0)) def blastn( - profile: str = '', + profile: Optional[str] = '', query: str = '', task: Optional[ChoicesTaskBlastn] = ChoicesTaskBlastn.megablast, db: Optional[str | None] = None, @@ -183,33 +199,32 @@ def blastn( gapextend: Optional[int | None] = None ): """ - blastn - - blastn for searching nucleotide query sequence in a nucleotides BLAST database + search nucleotide databases using a nucleotide query + """ + """ Arguments: profile: the profile from .blast2galaxy.config.toml query: file path with your query sequence task: the blastn task: megablast or something db: the BLAST database to search in - evalue: todo - out: todo - outfmt: todo - html: todo - dust: todo - strand: todo - max_hsps: todo - perc_identity: todo - word_size: todo - ungapped: todo - parse_deflines: todo - qcov_hsp_perc: todo - window_size: todo - gapopen: todo - gapextend: todo + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + dust: Filter out low complexity regions (with DUST) + strand: Query strand(s) to search against database/subject + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + perc_identity: Percent identity cutoff + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap """ - params = locals() params['tool'] = 'blastn' #print(params) @@ -280,6 +295,37 @@ def tblastn( gapextend: Optional[int] = None, comp_based_stats: Optional[str] = '2', ): + """ + search translated nucleotide databases using a protein query + """ + + """ + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + db_gencode: Genetic code to use to translate database/subjects (see user manual for details) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ params = locals() params['tool'] = 'tblastn' @@ -344,6 +390,37 @@ def blastp( comp_based_stats: Optional[str] = '2', use_sw_tback: Optional[bool] = False ): + """ + search protein databases using a protein query + """ + + """ + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + use_sw_tback: Compute locally optimal Smith-Waterman alignments? + """ params = locals() params['tool'] = 'blastp' @@ -407,6 +484,36 @@ def blastx( gapextend: Optional[int] = None, comp_based_stats: Optional[str] = '2', ): + """ + search protein databases using a translated nucleotide query + """ + + """ + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + html: Format output as HTML document + seg: Filter out low complexity regions (with SEG) + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + num_descriptions: Number of database sequences to show one-line descriptions for. Not applicable for outfmt > 4. Default = 500 * Incompatible with: max_target_seqs + num_alignments: Number of database sequences to show alignments for. Default = 250 * Incompatible with: max_target_seqs + threshold: Minimum word score such that the word is added to the BLAST lookup table + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + word_size: Word size for wordfinder algorithm + ungapped: Perform ungapped alignment only? + parse_deflines: Should the query and subject defline(s) be parsed? + qcov_hsp_perc: Minimum query coverage per hsp (percentage, 0 to 100) + window_size: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ params = locals() params['tool'] = 'blastx' @@ -467,14 +574,41 @@ def diamond_blastp( strand: Optional[ChoicesStrand] = ChoicesStrand.both.value, matrix: Optional[str] = 'BLOSUM62', max_target_seqs: Optional[int] = None, - #threshold: Optional[float] = None, max_hsps: Optional[int] = None, - #ungapped: Optional[bool] = False, window: Optional[int] = None, gapopen: Optional[int] = None, gapextend: Optional[int] = None, comp_based_stats: Optional[str] = '1', ): + """ + search protein databases using a protein query with DIAMOND + """ + + """ + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + faster: faster mode + fast: fast mode + mid_sensitive: mid_sensitive mode + sensitive: sensitive mode + more_sensitive: more_sensitive mode + very_sensitive: very_sensitive mode + ultra_sensitive: ultra_sensitive mode + strand: Query strand(s) to search against database/subject + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + window: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ params = locals() params['tool'] = 'diamond_blastp' @@ -510,9 +644,7 @@ def diamond_blastp( @click.option('--strand', help=HELP.strand, type=click.Choice(ChoicesStrand, case_sensitive=False), default=ChoicesStrand.both.value, show_default=True) @click.option('--matrix', help = HELP.matrix, type=str, default='BLOSUM62', show_default=True) @click.option('--max-target-seqs', help = HELP.max_target_seqs, type=click.IntRange(1), default=500, show_default=True) -#@click.option('--threshold', help = HELP.threshold, type=click.FloatRange(0.0)) @click.option('--max-hsps', help=HELP.max_hsps, type=int) -#@click.option('--ungapped', help=HELP.ungapped, is_flag=True) @click.option('--window', help=HELP.window_size, type=click.IntRange(1)) @click.option('--gapopen', help=HELP.gapopen, type=click.IntRange(0)) @click.option('--gapextend', help=HELP.gapextend, type=click.IntRange(0)) @@ -535,14 +667,41 @@ def diamond_blastx( strand: Optional[ChoicesStrand] = ChoicesStrand.both.value, matrix: Optional[str] = 'BLOSUM62', max_target_seqs: Optional[int] = None, - #threshold: Optional[float] = None, max_hsps: Optional[int] = None, - #ungapped: Optional[bool] = False, window: Optional[int] = None, gapopen: Optional[int] = None, gapextend: Optional[int] = None, comp_based_stats: Optional[str] = '1', ): + """ + search protein databases using a translated nucleotide query with DIAMOND + """ + + """ + Arguments: + profile: the profile from .blast2galaxy.config.toml + query: file path with your query sequence + task: the blastn task: megablast or something + db: the BLAST database to search in + evalue: Expectation value cutoff + out: Path / filename of file to store the BLAST result + outfmt: Output format + faster: faster mode + fast: fast mode + mid_sensitive: mid_sensitive mode + sensitive: sensitive mode + more_sensitive: more_sensitive mode + very_sensitive: very_sensitive mode + ultra_sensitive: ultra_sensitive mode + strand: Query strand(s) to search against database/subject + matrix: Scoring matrix name (normally BLOSUM62) + max_target_seqs: Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = 500 + max_hsps: Maximum number of HSPs (alignments) to keep for any single query-subject pair + window: Multiple hits window size: use 0 to specify 1-hit algorithm, leave blank for default + gapopen: Cost to open a gap + gapextend: Cost to extend a gap + comp_based_stats: Use composition-based statistics: D or d: default (equivalent to 2 ); 0 or F or f: No composition-based statistics; 1: Composition-based statistics as in NAR 29:2994-3005, 2001; 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties; 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally + """ params = locals() params['tool'] = 'diamond_blastx' diff --git a/src/blast2galaxy/config.py b/src/blast2galaxy/config.py index 17af768..bb8182a 100644 --- a/src/blast2galaxy/config.py +++ b/src/blast2galaxy/config.py @@ -10,7 +10,7 @@ class ConfigHolder: def __init__(self): - self.config = None + self.config = {} conf = ConfigHolder() @@ -24,43 +24,87 @@ def set_config(config): print('Set conf to: ', conf.config) +def add_server(server, server_url, api_key): + if 'servers' not in conf.config: + conf.config['servers'] = {} + + conf.config['servers'][server] = { + 'server_url': server_url, + 'api_key': api_key + } + +def add_profile(profile, server, tool): + if 'profiles' not in conf.config: + conf.config['profiles'] = {} + + conf.config['profiles'][profile] = { + 'server': server, + 'tool': tool + } + +def add_default_server(server_url, api_key): + add_server('default', server_url, api_key) + +def add_default_profile(server_id, tool): + add_profile('default', server_id, tool) + + def load_config_toml(): + + if conf.config: + #print('HAS runtime config!!!') + #print(conf) + #print(conf.config) + #print('^^^^^^^^^^^^^^^^^^^^^^^^^^^^') + return conf.config, False + config_path_cwd = Path.cwd().joinpath('.blast2galaxy.toml') config_path_home_dir = Path.home().joinpath('.blast2galaxy.toml') try: with open(config_path_cwd, 'rb') as f: config = tomllib.load(f) - return config + return config, config_path_cwd except FileNotFoundError as e: try: with open(config_path_home_dir, 'rb') as f: config = tomllib.load(f) - return config + return config, config_path_home_dir except FileNotFoundError as e: err_msg = 'Could not find the config file `.blast2galaxy.toml` in the current working directory or in your home directory: ' + str(Path.home()) - raise Exception(err_msg) from e + #raise Exception(err_msg) from e + print('ERROR: ', err_msg) + exit(1) def get_profile(server='default', profile=None): - config = load_config_toml() + config, _ = load_config_toml() + #print('================================') + #print(config) + #import json #print(json.dumps(config, indent=4)) #exit() if profile: - if profile in config['profiles'].keys(): - config_profile = config['profiles'][profile] - else: # use default profile - config_profile = config['profiles']['default'] - - if config_profile['server'] in config['servers'].keys(): - config_server = config['servers'][ config_profile['server'] ] - else: - exit(f'ERROR: The server `{server}` is not defined in the config TOML!') - - config_merged = config_server | config_profile + try: + if profile in config['profiles'].keys(): + config_profile = config['profiles'][profile] + else: # use default profile + config_profile = config['profiles']['default'] + + if config_profile['server'] in config['servers'].keys(): + config_server = config['servers'][ config_profile['server'] ] + else: + exit(f'ERROR: The server `{server}` is not defined in the config TOML!') + + config_merged = config_server | config_profile + + except KeyError as e: + err_msg = f'The profile `{profile}` could not be found in the configuration.' + print('ERROR: ', err_msg) + exit(1) else: # no profile given, just use server argument of get_profile()