diff --git a/docs/databases_klifs_statement_of_need.rst b/docs/databases_klifs_statement_of_need.rst index 95fb15a3..1ae26cd8 100644 --- a/docs/databases_klifs_statement_of_need.rst +++ b/docs/databases_klifs_statement_of_need.rst @@ -1,24 +1,25 @@ Statement of need ================= -OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to -integrate kinase resources into Python-based research projects. -This module offers access to KLIFS data [Kanev_2021]_ such as information about kinases, -structures, ligands, +The KLIFS resource [Kanev_2021]_ contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and -aligned across all structures using a multiple sequence alignment (MSA) [vanLinden_2014]_. -With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely -from the KLIFS webserver. -The presented module provides identical APIs for the remote and local queries for KLIFS data and -streamlines all output into -standardized `Pandas `_ DataFrames to allow for easy and -quick downstream data analyses (Figure 1). This Pandas-focused setup is ideal to work with in -Jupyter notebooks [Kluyver_2016]_. +aligned across all structures using a multiple sequence alignment [vanLinden_2014]_. +Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based +pipelines is currently not straight-forward, especially for users without a background in +online queries. +Furthermore, switching between data queries from a *local* KLIFS download and +the *remote* KLIFS database is not readily possible. -`OpenCADD-KLIFS `_ -(``opencadd.databases.klifs``) is a part of the `OpenCADD `_ -package, a collection of Python modules for structural cheminformatics. +OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to +integrate kinase resources into Python-based research projects. +With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or +remotely from the KLIFS webserver. +The presented module provides identical APIs for the remote and local queries and +streamlines all output into standardized Pandas DataFrames +`Pandas `_ to allow for easy and quick +downstream data analyses (Figure 1). +This Pandas-focused setup is ideal if you work with Jupyter notebooks [Kluyver_2016]_. .. raw:: html @@ -29,45 +30,6 @@ package, a collection of Python modules for structural cheminformatics. *Figure 1*: OpenCADD-KLIFS fetches KLIFS data offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames. -The KLIFS database offers a REST API compliant with the OpenAPI specification -(`KLIFS OpenAPI `_). -Our module OpenCADD-KLIFS uses `bravado `_ to dynamically -generate a Python client based on the OpenAPI definitions and adds wrappers to enable the -following functionalities: - -- A session is set up, which allows access to various KLIFS *data sources* by different - *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently - include kinases, structures and annotated conformations, modified residues, pockets, ligands, - drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. - For example, ``session.structures.by_kinase_name`` fetches information on all structures for a - query kinase. -- The same API is used for local and remote sessions. -- The returned data follows the same schema regardless of the session type (local/remote); all - results obtained with bravado are formatted as Pandas DataFrames with standardized column names, - data types, and handling of missing data. -- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections - such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded - via biopandas [Raschka_2017]_ or `RDKit `_. - -OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more -complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation" -or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in -complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps -and DataFrame merges in case multiple KLIFS datasets need to be combined. -OpenCADD-KLIFS is currently used in several projects -from the `Volkamer Lab `_ -including -`TeachOpenCADD `_, -`OpenCADD-pocket `_, -`KiSSim `_, -`KinoML `_, and -`PLIPify `_. -For example, OpenCADD-KLIFS is applied in a -`TeachOpenCADD tutorial `_ -to demonstrate how to fetch all kinase-ligand interaction profiles for all available EGFR kinase -structures to visualize the per-residue interaction types and frequencies with only a few -lines of code. - .. [Kanev_2021] Kanev et al., (2021), KLIFS: an overhaul after the first 5 years of supporting kinase research, Nucleic Acids Research, @@ -80,7 +42,4 @@ lines of code. .. [Kluyver_2016] Kluyver et al., (2016), Jupyter Notebooks – a publishing format for reproducible computational workflows, In Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87-90, - doi:10.3233/978-1-61499-649-1-87. -.. [Raschka_2017] Raschka, (2017), - BioPandas: Working with molecular structures in pandas DataFrames, Journal of Open Source Software, - 2(14), 279, doi:10.21105/joss.00279. \ No newline at end of file + doi:10.3233/978-1-61499-649-1-87. \ No newline at end of file