Releases: erfanzar/EasyDeL
Releases · erfanzar/EasyDeL
EasyDeL version 0.0.69
This release brings significant scalability improvements, new models, bug fixes, and usability enhancements to EasyDeL.
Highlights:
- Multi-host GPU Training: EasyDeL now scales seamlessly across multiple GPUs and hosts for demanding training workloads.
- New Models: Expand your NLP arsenal with the addition of Gemma2, OLMo, and Aya models.
- Improved KV Cache Quantization: Enjoy a substantial accuracy boost with enhanced KV cache quantization, achieving +21% accuracy compared to the previous version.
- Simplified Model Management: Load and save pretrained models effortlessly using the new
model.from_pretrained
andmodel.save_pretrained
methods. - Enhanced Generation Pipeline: The
GenerationPipeLine
now supports streaming token generation, ideal for real-time applications. - Introducing the ApiEngine: Leverage the power of the new
ApiEngine
andengine_client
for seamless integration with your applications.
Other Changes:
- Fixed GPU Flash Attention bugs for increased stability.
- Updated required
jax
version to>=0.4.28
for optimal performance. Versions0.4.29
or higher are recommended if available. - Streamlined the
structure
import process and resolved multi-host training issues.
Upgrade:
To upgrade to EasyDeL v0.0.69, use the following command:
pip install --upgrade easydel==0.0.69
EasyDeL - 0.0.67
-
New Features
GenerationPipeLine
was added for fast streaming and easy generation with JAX.- Using Int8Params instead of
LinearBitKernel
. - Better GPU support.
- EasyDeLState is now better and supports more general options.
- Trainers now support
.save_pretrained(to_torch)
and training logging. - EasyDeLState supports to_8bit.
- All of the models support
to_8bit
for params. - imports are now 91x times faster in EasyDeL version 0.0.67.
-
Removed API
JAXServe
is no longer available.PyTorchServe
is no longer available.EasyServe
is no longer available.LinearBitKernel
is no longer available.EasyDeL
partitioners are no longer available.Llama/Mistral/Falcon/Mpt
static convertors or transforms are no longer available.
-
Known Issues
- Lora Kernel Sometimes Crash.
GenerationPipeLine
has a compiling problem when the number of available devices is more than 4 and using 8_bit params.- Most of the features won't work for TPU-v3 and GPUs with compute capability lower than 7.5.
- Kaggle session will crash after importing EasyDeL (Kaggle's latest environment is not stable it's not related to EasyDeL). (Fixed in EasyDeL version 0.0.67)
Pallas Fusion: GPU Turbocharged 🚀
EasyDeL version 0.0.65
-
New Features
- Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
- ORPO Trainer is added and now it's in your bag.
- WebSocket Serve Engine.
- Now EasyDeL is 30% faster on GPUs.
- No JAX-Triton is now needed to run GPU kernels.
- Now you can specify the backward kernel implementation for Pallas Attention.
- now you have to import EasyDeL as
easydel
instead ofEasyDel
.
-
New Models
- OpenELM model series are now present.
- DeepseekV2 model series are now present.
-
Fixed Bugs
- CUDNN FlashAttention Bugs are now fixed.
- Llama3 Model 8Bit quantization of parameters had a lot of improvements.
- Splash Attention bugs on TPUs are now fixed .
- Dbrx Model Bugs are fixed.
- DPOTrainer Bugs are Fixed (creating dataset).
-
Known Bugs
- Splash Attention won't work on TPUv3.
- Pallas Attention won't work on TPUv3.
- You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
- Some Examples are out dated.
Full Changelog: 0.0.63...0.0.65
0.0.63
whats changed
- Phi3 Model Added.
- Dbrx Model Added.
- Arctic Model Added.
- Lora Fine-Tuning Bugs Fixed.
- Vanilla Attention is Optimized.
- Sharded Vanilla is the default attention mechanism now.
Full Changelog: 0.0.61...0.0.63
EasyDeL-0.0.61 Dynamic Changes
What's Changed
- Add support for iterable dataset loading by @yhavinga in #138
SFTTrainer
bugs are fixed.Parameter quantization
is now available for all of the models.AutoEasyDeLModelForCausalLM
now supportsload_in_8bit
.- Memory Management improved.
Gemma
Models Generation Issue is now Fixed.- Trainers are now 2~8% faster.
- Attention Operation is improved.
- The
Cohere
Model is now present. JAXServer
is improved.- Due to recent changes a lot of examples of documentation have changed and will be changed soon.
Full Changelog: 0.0.60...0.0.61
EasyDeL Version 0.0.60
What's Changed
SFTTrainer
is now available.VideoCausalLanguageModelTrainer
is now available.- New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
- MoE models had some speed improvements.
- Training Speed is now 18%~42% faster.
- Normal Attention is now faster by 12%~30% #131 .
- DPOTrainer Bugs Fixed.
- CausalLanguageModelTrainer is now more customizable.
- WANDB logging has improved.
- Performace Mode is added to Training Arguments.
- Model configs pass attributes to PretrainedConfig to prevent override… by @yhavinga in #122
- Ignore token label smooth z loss by @yhavinga in #123
- Time the whole train loop instead of only call to train step function by @yhavinga in #124
- Add save_total_limit argument to delete older checkpoints by @yhavinga in #127
- Add gradient norm logging, fix metric collection on multi-worker setup by @yhavinga in #135
Full Changelog: 0.0.55...0.0.60
EasyDeL Version 0.0.55
EasyDeL Version 0.0.55
- JAX
DPOTrainer
Bugs Fixed - StableLM Models are supported with FlashAttention and RING-Attention
- RingAttention is supported for Up to 512K or 1M token training and inference
- chunk MLP Is Supported for Up to 512K or 1M token training and inference
- now all the Models support shared key and value caching for high context length interface and can be accessed via
use_sharded_kv_caching=True
in model config (see examples). - EasyDeL successfully passed 1256000 Context Length Inference on TPUs (Llama Model Tested)
- Vision Trainer is added, you might except some bugs from that.
Full Changelog: 0.0.50...0.0.55
0.0.50 Mixture of EasyDeL experts
What's Changed
- Optimize mean loss and accuracy calculation by @yhavinga in #100
- Mixtral Models are fully supported and they are
PJIT-compatible
- A Wider range of models now support FlashAttention on TPU
- Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and
EasyBIT
- LoRA support for the trainer is now Added (
EasyDeLXRapTureConfig
) - Adding EasyDel Serve Engine APIs
- Adding Prompter (Beta and might be removed in future updates)
- The Training Process is now 21 % Faster in
0.0.50
than0.0.42
. - Transform Functions are now Automated for all the models (Except
MosaicMPT
for this one you still have to use static methods) - The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
- Default Version of the JAX now changed to 0.4.22 for
FJFormer
custom Pallas kernels usage.
New Contributors
Full Changelog: 0.0.42...0.0.50
Version 0.0.42 Easy State
New Features:
EasyDelState
is added- Auto Convertors from torch > huggingface > jax > flax > EasyDel are added
- Trainer has a lot of improvements
Full Changelog: 0.0.41...0.0.42
0.0.41
what has changed so far in 0.0.41
- API Changes
- making CausalLanguageModel Trainer separated from others
- Custom Errors added
- Timer bugs fixed
AutoEasyDelForCasualLM
is now more automated and falcon bugs have been fixed- 4D Mesh being used for better partitioning
- And many many more
Full Changelog: 0.0.40...0.0.41