Pallas Fusion: GPU Turbocharged 🚀
EasyDeL version 0.0.65
-
New Features
- Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
- ORPO Trainer is added and now it's in your bag.
- WebSocket Serve Engine.
- Now EasyDeL is 30% faster on GPUs.
- No JAX-Triton is now needed to run GPU kernels.
- Now you can specify the backward kernel implementation for Pallas Attention.
- now you have to import EasyDeL as
easydel
instead ofEasyDel
.
-
New Models
- OpenELM model series are now present.
- DeepseekV2 model series are now present.
-
Fixed Bugs
- CUDNN FlashAttention Bugs are now fixed.
- Llama3 Model 8Bit quantization of parameters had a lot of improvements.
- Splash Attention bugs on TPUs are now fixed .
- Dbrx Model Bugs are fixed.
- DPOTrainer Bugs are Fixed (creating dataset).
-
Known Bugs
- Splash Attention won't work on TPUv3.
- Pallas Attention won't work on TPUv3.
- You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
- Some Examples are out dated.
Full Changelog: 0.0.63...0.0.65