forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quanto serialization #1
Draft
SunMarc
wants to merge
170
commits into
quanto_integration
Choose a base branch
from
quanto-serialization
base: quanto_integration
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…with more than 2 devices (huggingface#29609) add new arg
* Refactor TFP call to just sigmoid() * Make sure we cast to the right dtype
* fix batchinng tests for new models * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>
…#29632) update Co-authored-by: ydshieh <[email protected]>
* Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Updated index.md * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Fixed config docstring. Added channels property * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Fixed config backbone compat * Ran fix-copies * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Fixed issue from rebase * Fixed issue from rebase * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported * Fixed issue from rebase * Fixed issue from rebase * Changed model name in docs * Removed duplicate PvtV2Backbone * Work around type switching issue in tests * Fix model name in config comments * Update docs/source/en/model_doc/pvt_v2.md Co-authored-by: Arthur <[email protected]> * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed from using 'sr_type' to 'linear_attention' for clarity * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed old code * Changed from using 'sr_type' to 'linear_attention' for clarity * Fixed Class names to be more descriptive * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed outdated code * Moved paper abstract to single line in pvt_v2.md * Added usage tips to pvt_v2.md * Simplified module inits by passing layer_idx * Fixed typing for hidden_act in PvtV2Config * Removed unusued import * Add pvt_v2 to docs/source/en/_toctree.yml * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Move function parameters to single line Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Update year of copyright to 2024 Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Make code more explicit Co-authored-by: amyeroberts <[email protected]> * Updated sr_ratio to be more explicit spatial_reduction_ratio * Removed excess type hints in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Move params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Removed needless comment in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Update copyright date in pvt_v2.md Co-authored-by: amyeroberts <[email protected]> * Moved params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Updated copyright date in configuration_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Cleaned comments in modeling_pvt_v2.py Co-authored-by: amyeroberts <[email protected]> * Renamed spatial_reduction Conv2D operation * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py " This reverts commit c4a0441. * Updated conversion script to reflect module name change * Deprecated reshape_last_stage option in config * Removed unused imports * Code formatting * Fixed outdated decorators on test_inference_fp16 * Added "Copied from" comments in test_modeling_pvt_v2.py * Fixed import listing * Updated model name * Force empty commit for PR refresh * Fixed linting issue * Removed # Copied from comments * Added PVTv2 to README_fr.md * Ran make fix-copies * Replace all FoamoftheSea hub references with OpenGVLab * Fixed out_indices and out_features logic in configuration_pvt_v2.py * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py * Ran code fixup * Fixed order of parent classes in PvtV2Config to fix the to_dict method override --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Arthur <[email protected]>
…ingface#29643) * remove ChatML link from en/ * remove ChatML link in ja/ * remove ChatML link in zh/
Add newly added models to all README files. Also fix one relative path in README_ru.md.
… saved on TPU (huggingface#29388) * Fix for saving ad apter weights when using PEFT * Change supported-classes to PushToHubMixin
Manually call sync step
* add arg --------- Co-authored-by: ydshieh <[email protected]>
* update --------- Co-authored-by: ydshieh <[email protected]>
…a dict of templates (huggingface#29658) * Allow apply_chat_template to pass kwargs to the template * Fix priority for template_kwargs * Fix docstring * style fix * Add the option for the model to have a dict of templates * Error message cleanup * Add test for chat template dicts * Simplify the chat template dict test and apply it to all tokenizers in self.get_tokenizers() * Save chat template dicts as lists with fixed key names * Add test for serialization/reloading * Add require_jinja just to be safe, even though I don't think we use it
…#29661) * docs:inaccurate_code_example * Inaccurate code example within inline code-documentation
…9000) * Extend import utils to cover "editable" torch versions * Re-add type hint * Remove whitespaces * Double quote strings * Update comment Co-authored-by: Yih-Dar <[email protected]> * Restore package_exists * Revert "Restore package_exists" This reverts commit 66fd2cd. --------- Co-authored-by: Yih-Dar <[email protected]>
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (huggingface#2) Some cleanup * Delete cohere-model directory (huggingface#3) * Make Fix (huggingface#5) * Pr fixes (huggingface#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (huggingface#8) * tokenizer test * format fix * Adding Docs and other minor changes (huggingface#7) * Add modeling tests (huggingface#9) * Smol Fix (huggingface#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (huggingface#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (huggingface#14) * Update chat templates to use the new API (huggingface#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>
* gix * fix style * remove equivalent tests * add back for image_processor * remove again
Removed static_real_features from AutoformerForPrediction example code Signed-off-by: Maciej Torhan <[email protected]>
…nvironment before testing (huggingface#29477) * gix * fix style * add warning * revert * no newline * revert * revert * add CUDA as well
update Co-authored-by: ydshieh <[email protected]>
Signed-off-by: guoguangwu <[email protected]>
* Update run_glue.py * Update run_glue.py * Update run_glue_no_trainer.py
* start integration * fix * add and debug tests * update tests * make pytorch serialization works * compatible with device_map and offload * fix tests * make style * add ref * guard against safetensors * add float8 and style * fix is_serializable * Fix shard_checkpoint compatibility with quanto * more tests * docs * adjust memory * better * style * pass tests * Update src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <[email protected]> * add is_safe_serialization instead * Update src/transformers/quantizers/quantizer_quanto.py Co-authored-by: Younes Belkada <[email protected]> * add QbitsTensor tests * fix tests * simplify activation list * Update docs/source/en/quantization.md Co-authored-by: David Corvoysier <[email protected]> * better comment * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <[email protected]> * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <[email protected]> * find and fix edge case * Update docs/source/en/quantization.md Co-authored-by: Arthur <[email protected]> * pass weights_only_kwarg instead * fix shard_checkpoint loading * simplify update_missing_keys * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: Arthur <[email protected]> * recursion to get all tensors * block serialization * skip serialization tests * fix * change by cuda:0 for now * fix regression * update device_map * fix doc * add noteboon * update torch_dtype * update doc * typo * typo * remove comm --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: David Corvoysier <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Younes Belkada <[email protected]>
* replace breaks by a loop condition * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>
* fix speech_to_test generation tests * Add details to comment * Update tests/models/speech_to_text/test_modeling_speech_to_text.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: amyeroberts <[email protected]>
Revert "Fix wrong condition used in `filter_models` (huggingface#29673)" This reverts commit 174aecd.
* add attention to es/ and edit es/_toctree.yml * translate attention.md * fix transformers * fix transformers
* fix bug and add tests * nit * otherway to get the cur len instead of attention mask * more places where this might have been broken * nit * oups * inputs_embeds vs input_embeds * test generated outptus * style * nit * fix * skip failing biogpt
…LM (huggingface#29904) * Fix sinusoidal_embeddings in FlaubertModel * Fix for Informer * Fix for XLM * Move sinusoidal emb for XLM * Move sinusoidal emb for Flaubert * Small cleanup * Add comments on tests code copied from * Add with Distilbert->
* fix issue with logit processor in beam search in Flax * adding FlaxNoRepeatNGramLogitsProcessor class + unit test * style correction and code verification * add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests * fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams * replace non-jit compatible masking of ngrams that are not yet generated with jittable version * Revert "fix issue with logit processor in beam search in Flax" This reverts commit 09b70d7. * add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor * change the method of casting to boolean of banned tokens indices * fix code style * remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop * remove useless loop iterations * set some variables that were calculated and used multiple times * fix format
…gface#29939) * add FA2 to o.g Musicgen * make style * add FA2 support to Musicgen Melody * add generation FA2 tests to o.g Musicgen * make style and fix copies * add Musicgen to FA2 docs + deprecate list * add sdpa supports to Musicgen's * make style and fix copies * refactor attention implementation arguments * add Copied from to sdpa tests * add copied form in sdpa tests melody * add copied for FA2 generation tests * add FA2 inference copied from * make style
…face#29311) * Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode * Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode * Exclude pad_token filtering since it is used as CTC-blank token * Add small test for skip_special_tokens * Update decoding test for added new token
) * Hard error when ignoring tensors. (huggingface#27484) * [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by: Arthur <[email protected]> --------- Co-authored-by: Arthur <[email protected]> * Add small tests. * Dead variable. * Fixup. * Fixing tied_Weights_keys on generic models. * Fixup + T5 encoder/decoder tying (with different layers) * Code quality. * Dynamic member. * trigger * Fixing encoder name for other types of encoder/decoder combos. * Fix scoping. * Update .github/workflows/self-scheduled.yml Co-authored-by: Arthur <[email protected]> * Fixing the tied_weights after the call. --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: ydshieh <[email protected]>
* fix norm * fix logits processors doctests
update Co-authored-by: ydshieh <[email protected]>
* fix encodec onnx export for musicgen * simplification * fix quality * better style
To address the issue of NaN logit outputs for certain combinations of the `image_size`, `patch_size` and `depths` configuration parameters, an assertion was made to ensure that the resulting `window_size` field in the model's Self Attention class is greater than 1, preventing divisions by zero in the normalization of `relative_coords_table`. Fix: huggingface#28675
qwen2: fixed tokens starting with # in slow tokenizer; add tests Co-authored-by: jklj077 <[email protected]>
* Fix generate_with_fallback **kwargs * Change pop to get * Delete keys from kwargs to prevent overriding generation_config * Revert to passing kwargs by reference, but make a (shallow) copy * dict -> copy.copy * Add test_whisper_longform_multi_batch_beam
…uting scores (huggingface#29248) * Fix is_scores_logprobs in WhisperNoSpeechDetection * Add test_whisper_longform_no_speech_detection * Fix typo
* fix vipllava generation * consistent llava code * revert llava tests changes
new audio file
* fix * sort imports
* Docstring to note about zero init * Check for accelerate * Change conditional return * Tweak * Add new accelerate-specific zero3 check * Fix import * Revert to RTFM * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>
feat: enable mult-idevice for efficientnet
* implement convert_mamba_ssm_checkpoint_to_pytorch * Add test test_model_from_mamba_ssm_conversion * moved convert_ssm_config_to_hf_config to inside mamba_ssm_available check * fix skipif clause * moved skips to inside test since skipif decorator isn't working for some reason * Added validation * removed test * fixup * only compare logits * remove weight rename * Update src/transformers/models/mamba/convert_mamba_ssm_checkpoint_to_pytorch.py Co-authored-by: amyeroberts <[email protected]> * nits --------- Co-authored-by: amyeroberts <[email protected]>
) * Defaulted IdeficsProcessor padding to 'longest', removed manual padding * make fixup * Defaulted processor call to padding=False * Add padding to processor call in IdeficsModelIntegrationTest as well * Defaulted IdeficsProcessor padding to 'longest', removed manual padding * make fixup * Defaulted processor call to padding=False * Add padding to processor call in IdeficsModelIntegrationTest as well * redefaulted padding=longest again * fixup/doc
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this do ?
Fix serialization with safetensors + weights_only
Needs this branch of quanto: huggingface/optimum-quanto#120
cc @younesbelkada