An awesome repository of community-curated applications of ChatGPT and other LLMs in computational biology!
Any material is welcome, as long as it is:
- Free to read without accounts
- High quality (subjective, but we know it when we see it)
- Relevant to bioinformaticians
To contribute, just add the link to this document and open a Pull Request!
Have fun!
- Data science prompts: https://github.com/travistangvh/ChatGPT-Data-Science-Prompts
- ChatGPT cheatsheet for Data Science: https://www.datacamp.com/cheat-sheet/chatgpt-cheat-sheet-data-science
- Fun and useful prompts: https://github.com/f/awesome-chatgpt-prompts
- List of awesome ChatGPT lists: https://github.com/OpenMindClub/awesome-chatgpt
- Collection of prompts for developers (YouTube): https://www.youtube.com/watch?v=sTeoEFzVNSc&t=901s
- GPT for Google Sheets: https://gptforwork.com/
- GPT for R Studio: https://github.com/MichelNivard/gptstudio
- GPT for R Developers: https://jameshwade.github.io/gpttools/
- GPT for PDFs (in a web interface, freemium service): https://chatpdf.com
- GPT directly from the command line: https://github.com/npiv/chatblade
- GPT-based chatbot fine-tuned for bioinformatics https://ai.tinybio.cloud/chat (discussed on Biostars)
- Very powerful question answering using GPT and internet searches: https://www.perplexity.ai/
- Python framework for connecting conversational AI to bioinformatics pipelines https://github.com/biocypher/biochatter and ChatGSE
- Quick guide of best practices for Prompt Engineering: https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api
- More advanced guidance on Prompt Engineering: https://www.promptingguide.ai/
- Good 20 min blogpost covering core aspects of Prompt Engineering: https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
- A very good post by Stephen Wolfram on the details on how it works: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
- List of papers published by OpenAI: https://openai.com/research
- Basic YouTube Crash Course on ChatGPT (12/December/2022, so a bit old): https://www.youtube.com/watch?v=JTxsNm9IdYU
- Technnical overview of details on how LLMs work and what can be their future (Eight Things to Know about Large Language Models): https://cims.nyu.edu/~sbowman/eightthings.pdf
- "Pause Giant AI Experiments: An Open Letter" https://futureoflife.org/open-letter/pause-giant-ai-experiments/
-
Ten Quick Tips for Harnessing the Power of ChatGPT/GPT-4 in Computational Biology (our article): https://arxiv.org/abs/2303.16429 and https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011319
-
Single-cell RNA-seq analysis in Python guided by ChatGPT (nice video, from Cell Ranger output to final plots): https://www.youtube.com/watch?v=fkuLFlC2ZWk
-
Using ChatGPT in bioinformatics and biomedical research (good blog post): https://omicstutorials.com/using-chatgpt-in-bioinformatics-and-biomedical-research/
-
ChatGPT for bioinformatics (on the perspective of a bioinformatician): https://medium.com/@91mattmoore/chatgpt-for-bioinformatics-404c6d0817a1
-
ChatGPT and bioinformatics careers (Reddit forum discussion): https://www.reddit.com/r/bioinformatics/comments/11wwnqj/chatgpt_and_bioinformatics_careers/
-
ChatG-PPi-T: Finding Interactions with OpenAI (far from production-ready, but creative idea): https://www.linkedin.com/pulse/chatg-ppi-t-finding-interactions-openai-jon-hill/
-
BioGPT: generative pre-trained transformer for biomedical text generation and mining: https://github.com/microsoft/BioGPT
-
A Platform for the Biomedical Application of Large Language Models: https://arxiv.org/abs/2305.06488
-
scGPT: Foundation model for single-cell biology: https://github.com/bowang-lab/scGPT (Paper (not OA))
- LangChain combines LLM/GPT API requests with (1) access to external documents and (2) abilities to talk to the wider web, creating semi-autonomous agents: https://python.langchain.com/en/latest/
- Llama Index is an interface for combining LLMs with external data (such as documents): https://gpt-index.readthedocs.io/en/latest/index.html
- guardrails is an application to improve and refine LLM outputs in Python: https://github.com/ShreyaR/guardrails
- Embedchain is a framework to create ChatGPT like bots over your personalised dataset in 3 lines of code: https://embedchain.ai/
- The Hugging Face Hub open LLM leaderboard collects and ranks open-source LLMs; very important for accessibility and reproducibility of LLM-based bioinformatics
The following prompts may not be the most efficient in terms of prompt engineering, but they are quick, useful and exemplify usage of ChatGPT for computational biologists.
- “Add explanatory comments to this code: {code here}”
- “Rename the variables for clarity: {code here}”
- "Render roxygen2 documentation for the function: {R code here}”.
- "Extract functions to increase modularity: {code here}"
- "Write a unit test for the following function and help me implement it: {code here}"
- "Re-write and optimize this for loop: {code here}"
- "Write me regex for R/python/Excel with a pattern that will extract {} from {}"
- "Act as a table. Add a new column with consistent labels to this dataset:"
- "Create a ggplot2 violin plot with a log10 Y axis"
- "Change my code to make the plot color-blind friendly"
- "Translate this plot from ggplot2 to matplotlib syntax"
- "A Guide to Using ChatGPT For Data Science Projects" : https://www.datacamp.com/tutorial/chatgpt-data-science-projects