Hugging Face articles:
- Model parallelism basis: https://huggingface.co/docs/transformers/v4.15.0/en/parallelism#zero-data-parallel
- FSDP: https://huggingface.co/docs/accelerate/en/usage_guides/fsdp
- Deepspeed: https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed
- Megatron-LM: https://huggingface.co/docs/accelerate/en/usage_guides/megatron_lm
Training Optimization: https://developer.nvidia.com/blog/mastering-llm-techniques-training Inference Optimization: https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization
RAG: https://www.youtube.com/watch?v=YuRFba27_1w Agent: https://www.youtube.com/watch?v=q1XFm21I-VQ
https://www.youtube.com/watch?v=45Zs12Xlg2g
- Two-Tree Algorithms for Full Bandwidth Broadcast, Reduction and Scan
- TPU v4 An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
- Megatron-LM Training Multi-Billion Parameter Language Models Using Model Parallelism
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
- Reducing Activation Recomputation in Large Transformer Models
Something critical:
- Do not try to use LLM to enhance the learning process, such generating questions and answers. You will get nothing.
- Consult the expert to save most of the time.
Thanks to Liyue Zhang and Guangnan Feng