Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenVINO documentation links in README.md #1

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

Intel [Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target.

[OpenVINO](https://docs.openvino.ai/latest/index.html) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.
[OpenVINO](https://docs.openvino.ai) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.


## Installation
Expand All @@ -20,7 +20,7 @@ To install the latest release of 🤗 Optimum Intel with the corresponding requi
| Accelerator | Installation |
|:-----------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------|
| [Intel Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) | `pip install --upgrade-strategy eager "optimum[neural-compressor]"` |
| [OpenVINO](https://docs.openvino.ai/latest/index.html) | `pip install --upgrade-strategy eager "optimum[openvino,nncf]"` |
| [OpenVINO](https://docs.openvino.ai) | `pip install --upgrade-strategy eager "optimum[openvino]"` |
| [Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/#introduction) | `pip install --upgrade-strategy eager "optimum[ipex]"` |

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
Expand Down Expand Up @@ -68,11 +68,11 @@ For more details on the supported compression techniques, please refer to the [d

## OpenVINO

Below are the examples of how to use OpenVINO and its [NNCF](https://docs.openvino.ai/latest/tmo_introduction.html) framework to accelerate inference.
Below are examples of how to use OpenVINO and its [NNCF](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/compressing-models-during-training.html) framework to accelerate inference.

#### Export:

It is possible to export your model to the [OpenVINO](https://docs.openvino.ai/2023.1/openvino_ir.html) IR format with the CLI :
It is possible to export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :

```plain
optimum-cli export openvino --model gpt2 ov_model
Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ limitations under the License.

[Intel Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target.

[OpenVINO](https://docs.openvino.ai/latest/index.html) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.
[OpenVINO](https://docs.openvino.ai) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.

<div class="mt-10">
<div class="w-full flex flex-col space-x-4 md:grid md:grid-cols-2 md:gap-x-5">
Expand All @@ -34,4 +34,4 @@ limitations under the License.
<p class="text-gray-700">Learn how to run inference with OpenVINO Runtime and to apply quantization, pruning and knowledge distillation on your model to further speed up inference.</p>
</a>
</div>
</div>
</div>
11 changes: 6 additions & 5 deletions docs/source/inference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Optimum Intel can be used to load optimized models from the [Hugging Face Hub](h

## Transformers models

You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors ([see](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) the full list of supported devices).
You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors
([see](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) the full list of supported devices).
For that, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.

As shown in the table below, each task is associated with a class enabling to automatically load your model.
Expand All @@ -33,7 +34,7 @@ As shown in the table below, each task is associated with a class enabling to au

### Export

It is possible to export your model to the [OpenVINO](https://docs.openvino.ai/2023.1/openvino_ir.html) IR format with the CLI :
It is possible to export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :

```bash
optimum-cli export openvino --model gpt2 ov_model
Expand Down Expand Up @@ -182,7 +183,7 @@ model.reshape(1,128)
model.compile()
```

To run inference on Intel integrated or discrete GPU, use `.to("gpu")`. On GPU, models run in FP16 precision by default. (See [OpenVINO documentation](https://docs.openvino.ai/nightly/openvino_docs_install_guides_configurations_for_intel_gpu.html) about installing drivers for GPU inference).
To run inference on Intel integrated or discrete GPU, use `.to("gpu")`. On GPU, models run in FP16 precision by default. (See [OpenVINO documentation](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html) about installing drivers for GPU inference).

```python
# Static shapes speed up inference
Expand Down Expand Up @@ -471,15 +472,15 @@ image = refiner(prompt=prompt, image=image[None, :]).images[0]
```


## Latent Consistency Models
### Latent Consistency Models


| Task | Auto Class |
|--------------------------------------|--------------------------------------|
| `text-to-image` | `OVLatentConsistencyModelPipeline` |


### Text-to-Image
#### Text-to-Image

Here is an example of how you can load a Latent Consistency Models (LCMs) from [SimianLuo/LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) and run inference using OpenVINO :

Expand Down
4 changes: 2 additions & 2 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ To install the latest release of 🤗 Optimum Intel with the corresponding requi
| Accelerator | Installation |
|:-----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
| [Intel Neural Compressor (INC)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) | `pip install --upgrade-strategy eager "optimum[neural-compressor]"`|
| [Intel OpenVINO](https://docs.openvino.ai/latest/index.html) | `pip install --upgrade-strategy eager "optimum[openvino,nncf]"` |
| [Intel OpenVINO](https://docs.openvino.ai ) | `pip install --upgrade-strategy eager "optimum[openvino]"` |

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

Expand All @@ -42,4 +42,4 @@ or to install from source including dependencies:
python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git
```

where `extras` can be one or more of `neural-compressor`, `openvino`, `nncf`.
where `extras` can be one or more of `neural-compressor`, `openvino`, `nncf`.
2 changes: 1 addition & 1 deletion optimum/exporters/openvino/model_patcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def patch_model_with_bettertransformer(model):
+ "[WARNING] For good performance with stateful models, transformers>=4.36.2 and PyTorch>=2.1.1 are required. "
f"This Python environment has Transformers {_transformers_version} and PyTorch {_torch_version}. "
"Consider upgrading PyTorch and Transformers, for example by running "
"`pip install --upgrade --upgrade-strategy eager optimum[openvino,nncf]`, and export the model again"
"`pip install --upgrade --upgrade-strategy eager optimum[openvino]`, and export the model again"
+ COLOR_RESET
)

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

EXTRAS_REQUIRE = {
"neural-compressor": ["neural-compressor>=2.2.0", "onnx", "onnxruntime<1.15.0"],
"openvino": ["openvino>=2023.3", "onnx", "onnxruntime"],
"openvino": ["openvino>=2023.3", "onnx", "onnxruntime", "nncf>=2.8.1"],
"openvino-tokenizers": ["openvino-tokenizers[transformers]"],
"nncf": ["nncf>=2.8.1"],
"ipex": ["intel-extension-for-pytorch", "onnx"],
Expand Down
2 changes: 2 additions & 0 deletions tests/generation/test_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
"gpt_neo": "hf-internal-testing/tiny-random-GPTNeoModel",
"mistral": "echarlaix/tiny-random-mistral",
"llama": "fxmarty/tiny-llama-fast-tokenizer",
"llama2": "Jiqing/tiny_random_llama2",
"gpt_bigcode": "hf-internal-testing/tiny-random-GPTBigCodeModel",
}

Expand All @@ -54,6 +55,7 @@ class ModelingIntegrationTest(unittest.TestCase):
"gpt_neo",
"mistral",
"llama",
"llama2",
# "gpt_bigcode",
)

Expand Down
2 changes: 2 additions & 0 deletions tests/ipex/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"gpt_neox": "hf-internal-testing/tiny-random-GPTNeoXForCausalLM",
"gpt_bigcode": "hf-internal-testing/tiny-random-GPTBigCodeModel",
"llama": "fxmarty/tiny-llama-fast-tokenizer",
"llama2": "Jiqing/tiny_random_llama2",
"opt": "hf-internal-testing/tiny-random-OPTModel",
"mpt": "hf-internal-testing/tiny-random-MptForCausalLM",
}
Expand All @@ -66,6 +67,7 @@ class IPEXIntegrationTest(unittest.TestCase):
"gpt_neo",
# "gpt_bigcode",
"llama",
"llama2",
"opt",
"mpt",
)
Expand Down
6 changes: 5 additions & 1 deletion tests/ipex/test_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"gptj": "hf-internal-testing/tiny-random-GPTJModel",
"levit": "hf-internal-testing/tiny-random-LevitModel",
"llama": "fxmarty/tiny-llama-fast-tokenizer",
"llama2": "Jiqing/tiny_random_llama2",
"marian": "sshleifer/tiny-marian-en-de",
"mbart": "hf-internal-testing/tiny-random-mbart",
"mistral": "echarlaix/tiny-random-mistral",
Expand Down Expand Up @@ -209,6 +210,7 @@ class IPEXModelForCausalLMTest(unittest.TestCase):
"gpt_neo",
"gpt_neox",
"llama",
"llama2",
"mistral",
# "phi",
"mpt",
Expand All @@ -226,7 +228,9 @@ def test_compare_to_transformers(self, model_arch):
self.assertTrue(ipex_model.use_cache)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokens = tokenizer(
"This is a sample", return_tensors="pt", return_token_type_ids=False if model_arch == "llama" else None
"This is a sample",
return_tensors="pt",
return_token_type_ids=False if model_arch in ("llama", "llama2") else None,
)
position_ids = None
if model_arch.replace("_", "-") in MODEL_TYPES_REQUIRING_POSITION_IDS:
Expand Down