Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new features #30

Open
JaydeepBhat opened this issue Nov 17, 2024 · 10 comments
Open

add new features #30

JaydeepBhat opened this issue Nov 17, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@JaydeepBhat
Copy link

JaydeepBhat commented Nov 17, 2024

Background
The script addon.py already have functions for preprocessing, normalization, clustering, cell type prediction, gene list for cells/tissues, gene expression per cell types, DEG per tissue types, and UMAP plotting for cell types.

Suggested features
Addition of following features will complete the omics agent task:
1. Data import module
2. csv reader function for agent
3. Omics data QC
4. batch effects correction
5. DEG for cell types
6. plot UMAP for tissue types
7. Cell annotation and cell discovery
8. Lineage/trajectory inference analysis
9. Functional enrichment, ontology and pathway analysis
10. Gene regulatory networks
11. cell-cell communication
12. Cell compositional analysis
13. Gene perturbation modeling
14. Metabolic modeling
15. Multi-omics data integration
16. Method-specific features (e.g. TFBS from scATAC-seq, immune receptors)

Next steps
1. unit testing
2. Connect these tasks with LangChain agent

@JaydeepBhat JaydeepBhat added the enhancement New feature or request label Nov 17, 2024
@dmccloskey
Copy link
Member

dmccloskey commented Nov 22, 2024

1 Updated data import functionality

Replace hard coded file location with e.g., cellxgene (search over metadata and download datasets by ID)

@dmccloskey
Copy link
Member

2. Ask dataframe agent

Functionality to ask questions of a dataframe

@dmccloskey
Copy link
Member

3. Omics data QC

Add filtering based on QC metrics such as mitochondrial gene counts, doublets, # of genes, etc.

@dmccloskey
Copy link
Member

4. batch effects correction and data visualization

Possible calling Seurat for batch correction and UMAP

@dmccloskey
Copy link
Member

6. plot UMAP for tissue types

Unify the plotting by cell type or by tissue type. In general, one should be able to plot a UMAP on whatever dimension they are interested in.

@dmccloskey
Copy link
Member

7a. Cell annotation

Option 1: user provides a custom gene list or Option 2: database e.g. ImmuneCellAI or CyberSort

7b. Cell discovery

Clustering and detection of clusters via a defined metric

@dmccloskey
Copy link
Member

12. Cell compositional analysis

Quantified proportions of cell types in a tissue or region of interest

@dmccloskey
Copy link
Member

14. Metabolic modeling

Integration with GRN to constrain a metabolic model with applications in cancer, immunological and inflammatory related diseases, etc.

@dmccloskey
Copy link
Member

15. Multi-omics data integration

  1. Dual ATAC-seq and RNA-seq kits from 10x
  2. Data fusion algorithms such as MoFA

@dmccloskey
Copy link
Member

dmccloskey commented Nov 22, 2024

Prerequisites

  • The above features should be labeled as either a "tool" or a "microagent" (e.g., cell2Sentence) and also propose what would be the function, script, class, etc that would be the "tool" or the single cell foundation model that would be the "microagent"
  • Map out the folder and file layout for the talk2cells agent
  • refactored and reorganize the current code into tools and microagents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants