Skip to content

Releases: erfanzar/EasyDeL

EasyDeL version 0.0.69

04 Jul 15:21
Compare
Choose a tag to compare

This release brings significant scalability improvements, new models, bug fixes, and usability enhancements to EasyDeL.

Highlights:

  • Multi-host GPU Training: EasyDeL now scales seamlessly across multiple GPUs and hosts for demanding training workloads.
  • New Models: Expand your NLP arsenal with the addition of Gemma2, OLMo, and Aya models.
  • Improved KV Cache Quantization: Enjoy a substantial accuracy boost with enhanced KV cache quantization, achieving +21% accuracy compared to the previous version.
  • Simplified Model Management: Load and save pretrained models effortlessly using the new model.from_pretrained and model.save_pretrained methods.
  • Enhanced Generation Pipeline: The GenerationPipeLine now supports streaming token generation, ideal for real-time applications.
  • Introducing the ApiEngine: Leverage the power of the new ApiEngine and engine_client for seamless integration with your applications.

Other Changes:

  • Fixed GPU Flash Attention bugs for increased stability.
  • Updated required jax version to >=0.4.28 for optimal performance. Versions 0.4.29 or higher are recommended if available.
  • Streamlined the structure import process and resolved multi-host training issues.

Upgrade:

To upgrade to EasyDeL v0.0.69, use the following command:

pip install --upgrade easydel==0.0.69

EasyDeL - 0.0.67

02 Jun 14:24
d33d2e8
Compare
Choose a tag to compare
  • New Features

    • GenerationPipeLine was added for fast streaming and easy generation with JAX.
    • Using Int8Params instead of LinearBitKernel.
    • Better GPU support.
    • EasyDeLState is now better and supports more general options.
    • Trainers now support .save_pretrained(to_torch) and training logging.
    • EasyDeLState supports to_8bit.
    • All of the models support to_8bit for params.
    • imports are now 91x times faster in EasyDeL version 0.0.67.
  • Removed API

    • JAXServe is no longer available.
    • PyTorchServe is no longer available.
    • EasyServe is no longer available.
    • LinearBitKernel is no longer available.
    • EasyDeL partitioners are no longer available.
    • Llama/Mistral/Falcon/Mpt static convertors or transforms are no longer available.
  • Known Issues

    • Lora Kernel Sometimes Crash.
    • GenerationPipeLine has a compiling problem when the number of available devices is more than 4 and using 8_bit params.
    • Most of the features won't work for TPU-v3 and GPUs with compute capability lower than 7.5.
    • Kaggle session will crash after importing EasyDeL (Kaggle's latest environment is not stable it's not related to EasyDeL). (Fixed in EasyDeL version 0.0.67)

Pallas Fusion: GPU Turbocharged 🚀

16 May 09:33
Compare
Choose a tag to compare

EasyDeL version 0.0.65

  • New Features

    • Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
    • ORPO Trainer is added and now it's in your bag.
    • WebSocket Serve Engine.
    • Now EasyDeL is 30% faster on GPUs.
    • No JAX-Triton is now needed to run GPU kernels.
    • Now you can specify the backward kernel implementation for Pallas Attention.
    • now you have to import EasyDeL as easydel instead of EasyDel.
  • New Models

    • OpenELM model series are now present.
    • DeepseekV2 model series are now present.
  • Fixed Bugs

    • CUDNN FlashAttention Bugs are now fixed.
    • Llama3 Model 8Bit quantization of parameters had a lot of improvements.
    • Splash Attention bugs on TPUs are now fixed .
    • Dbrx Model Bugs are fixed.
    • DPOTrainer Bugs are Fixed (creating dataset).
  • Known Bugs

    • Splash Attention won't work on TPUv3.
    • Pallas Attention won't work on TPUv3.
    • You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
    • Some Examples are out dated.

Full Changelog: 0.0.63...0.0.65

0.0.63

27 Apr 12:56
Compare
Choose a tag to compare

whats changed

  • Phi3 Model Added.
  • Dbrx Model Added.
  • Arctic Model Added.
  • Lora Fine-Tuning Bugs Fixed.
  • Vanilla Attention is Optimized.
  • Sharded Vanilla is the default attention mechanism now.

Full Changelog: 0.0.61...0.0.63

EasyDeL-0.0.61 Dynamic Changes

17 Apr 15:45
Compare
Choose a tag to compare

What's Changed

  • Add support for iterable dataset loading by @yhavinga in #138
  • SFTTrainer bugs are fixed.
  • Parameter quantization is now available for all of the models.
  • AutoEasyDeLModelForCausalLM now supports load_in_8bit.
  • Memory Management improved.
  • Gemma Models Generation Issue is now Fixed.
  • Trainers are now 2~8% faster.
  • Attention Operation is improved.
  • The Cohere Model is now present.
  • JAXServer is improved.
  • Due to recent changes a lot of examples of documentation have changed and will be changed soon.

Full Changelog: 0.0.60...0.0.61

EasyDeL Version 0.0.60

06 Apr 15:50
Compare
Choose a tag to compare

What's Changed

  • SFTTrainer is now available.
  • VideoCausalLanguageModelTrainer is now available.
  • New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
  • MoE models had some speed improvements.
  • Training Speed is now 18%~42% faster.
  • Normal Attention is now faster by 12%~30% #131 .
  • DPOTrainer Bugs Fixed.
  • CausalLanguageModelTrainer is now more customizable.
  • WANDB logging has improved.
  • Performace Mode is added to Training Arguments.
  • Model configs pass attributes to PretrainedConfig to prevent override… by @yhavinga in #122
  • Ignore token label smooth z loss by @yhavinga in #123
  • Time the whole train loop instead of only call to train step function by @yhavinga in #124
  • Add save_total_limit argument to delete older checkpoints by @yhavinga in #127
  • Add gradient norm logging, fix metric collection on multi-worker setup by @yhavinga in #135

Full Changelog: 0.0.55...0.0.60

EasyDeL Version 0.0.55

03 Mar 09:30
Compare
Choose a tag to compare

EasyDeL Version 0.0.55

  • JAX DPOTrainer Bugs Fixed
  • StableLM Models are supported with FlashAttention and RING-Attention
  • RingAttention is supported for Up to 512K or 1M token training and inference
  • chunk MLP Is Supported for Up to 512K or 1M token training and inference
  • now all the Models support shared key and value caching for high context length interface and can be accessed via use_sharded_kv_caching=True in model config (see examples).
  • EasyDeL successfully passed 1256000 Context Length Inference on TPUs (Llama Model Tested)
  • Vision Trainer is added, you might except some bugs from that.

Full Changelog: 0.0.50...0.0.55

0.0.50 Mixture of EasyDeL experts

08 Feb 11:40
Compare
Choose a tag to compare

What's Changed

  • Optimize mean loss and accuracy calculation by @yhavinga in #100
  • Mixtral Models are fully supported and they are PJIT-compatible
  • A Wider range of models now support FlashAttention on TPU
  • Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and EasyBIT
  • LoRA support for the trainer is now Added (EasyDeLXRapTureConfig)
  • Adding EasyDel Serve Engine APIs
  • Adding Prompter (Beta and might be removed in future updates)
  • The Training Process is now 21 % Faster in 0.0.50 than 0.0.42.
  • Transform Functions are now Automated for all the models (Except MosaicMPT for this one you still have to use static methods)
  • The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
  • Default Version of the JAX now changed to 0.4.22 for FJFormer custom Pallas kernels usage.

New Contributors

Full Changelog: 0.0.42...0.0.50

Version 0.0.42 Easy State

11 Jan 12:56
Compare
Choose a tag to compare

New Features:

  • EasyDelState is added
  • Auto Convertors from torch > huggingface > jax > flax > EasyDel are added
  • Trainer has a lot of improvements

Full Changelog: 0.0.41...0.0.42

0.0.41

26 Dec 17:41
Compare
Choose a tag to compare

what has changed so far in 0.0.41

  • API Changes
  • making CausalLanguageModel Trainer separated from others
  • Custom Errors added
  • Timer bugs fixed
  • AutoEasyDelForCasualLM is now more automated and falcon bugs have been fixed
  • 4D Mesh being used for better partitioning
  • And many many more

Full Changelog: 0.0.40...0.0.41