Bump transformers from 4.25.1 to 4.36.2 #455

dependabot · 2023-12-18T19:52:11Z

Bumps transformers from 4.25.1 to 4.36.2.

Release notes

Patch release: v4.36.2

Patch release to resolve some critical issues relating to the recent cache refactor, flash attention refactor and training in the multi-gpu and multi-node settings:

Resolve training bug with PEFT + GC #28031

Resolve cache issue when going beyond context window for Mistral/Mixtral FA2 #28037

Re-enable passing config to from_pretrained with FA #28043

Fix resuming from checkpoint when using FDSP with FULL_STATE_DICT #27891

Resolve bug when saving a checkpoint in the multi-node setting #28078

Patch release: v4.36.1

A patch release for critical torch issues mostly:

Fix SDPA correctness following torch==2.1.2 regression #27973

[Tokenizer Serialization] Fix the broken serialisation #27099

Fix bug with rotating checkpoints #28009

Hot-fix-mixstral-loss (#27948)

🔥

v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support

New model additions

Mixtral

Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.

The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as NllbMoe architecture in transformers. You can use it through AutoModelForCausalLM interface:
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")
>>> prompt = "My favourite condiment is"
>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
>>> model.to(device)
>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
The model is compatible with existing optimisation tools such Flash Attention 2, bitsandbytes and PEFT library. The checkpoints are release under mistralai organisation on the Hugging Face Hub.

Llava / BakLlava

... (truncated)

Commits

a7cab3c Release: v4.36.2
f6d6189 Fix bug for checkpoint saving on multi node training setting (#28078)
64bcf77 fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
780376f [Modeling / Mixtral] Fix GC + PEFT issues with Mixtral (#28061)
6e4429f [FA-2] Fix fa-2 issue when passing config to from_pretrained (#28043)
f33b061 Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window ...
d1dec79 [core / modeling] Fix training bug with PEFT + GC (#28031)
c48787f fix seamless import
bd65410 Release: v4.36.1
6342b9b Fix bug with rotating checkpoints (#28009)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.25.1 to 4.36.2. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.25.1...v4.36.2) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Dec 18, 2023

dependabot bot mentioned this pull request Dec 18, 2023

Bump transformers from 4.25.1 to 4.36.0 #451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump transformers from 4.25.1 to 4.36.2 #455

Bump transformers from 4.25.1 to 4.36.2 #455

dependabot bot commented on behalf of github Dec 18, 2023

Bump transformers from 4.25.1 to 4.36.2 #455

Are you sure you want to change the base?

Bump transformers from 4.25.1 to 4.36.2 #455

Conversation

dependabot bot commented on behalf of github Dec 18, 2023

Patch release: v4.36.2

Patch release: v4.36.1

v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support

New model additions

Mixtral

Llava / BakLlava