[GPU]Move RMSFusion pass ahead of ConvertPrecision #172

ceciliapeng2011 · 2024-09-20T01:58:49Z

Move RMSFusion pass ahead of ConvertPrecision.... this makes a larger range of nodes run with compressed precision (including rms).

…lkit#24417) ### Details: - *add new property for model cache encryption/decryption function* - *encrypt/decrypt topology in CPU model cache if the callbacks are provided.* ### Tickets: - *CVS-139600* --------- Co-authored-by: Chen Peter <[email protected]>

…nvinotoolkit#26236) size_t underflow might happen for the second dimension as well, for example when layout is ndhwc. New test cases extended for `ndhwc` layout as well ### Tickets: - 14877

…penvinotoolkit#26269) ### Details: - Currently, threading library like TBB is provided by openvino::runtime itself. It does not work well when multiple find_package(OpenVINO) are used within the project and with different `COMPONENTS` - E.g. here openvinotoolkit/openvino.genai#794 we have to add Threading component even for cases, when it's not really used, because find_package within nested subdirectory populates properties of openvino::runtime which is found in internal directory. - Example of usage openvinotoolkit/openvino_tokenizers#236

…er (openvinotoolkit#26273) Fixed alignment in model caching graphs.

) ### Details: - This PR fixes incorrect default buffers sizes configuration for static model in sdpa_opt kernel ### Tickets: - [CVS-150773](https://jira.devtools.intel.com/browse/CVS-150773)

…upport gcc < 9.0 (openvinotoolkit#26190) ### Details: - trying to compile npu plugin with gcc 8.5 or lower version resulted into issues with unsupported intrinsics ``` c++ <source>:6:17: error: '_mm_loadu_si64' was not declared in this scope __m128i a = _mm_loadu_si64((const __m128i *)(data)); ^~~~~~~~~~~~~~ <source>:6:17: note: suggested alternative: '_mm_loadl_epi64' __m128i a = _mm_loadu_si64((const __m128i *)(data)); ``` - technically they might be different since _mm_loadu_si64 specifically doesn't require data alignment while _mm_loadl_epi64 not specifying them - checked generated asm code(gcc 14.1) , well they are different, but tests with memory loading at 8 bits step showed no difference, so I assume both functions can work on unaligned arrays ![image](https://github.com/user-attachments/assets/c70ea4c2-2ab2-487d-b611-83ffe1a34ad3) - npuw unit test soon will be available in OV, with ongoing PR openvinotoolkit#25780, while i've tested them locally with new intrinsics ### Tickets: - *ticket-id* - 136269 Co-authored-by: Dmitry Matveev <[email protected]>

…it#26267) ### Details: - Fix out-of-bound access from permute kernel - It happened because the kernel loaded with vload even at the boundary ### Tickets: - 150360

…ches the squeeze axis (openvinotoolkit#26277) ### Details: - Do not apply crop optimization for squeeze if the crop axis matches the squeeze axis

…oolkit#26268) ### Details: - Apparently, `github.event.merge_group.base_ref` for merge group checks does not resolve into the same thing as `github.base_ref` for pull request checks. It is an issue since merge queue beta: [1](https://github.com/orgs/community/discussions/40277) - `github.event.merge_group.base_ref` resolves into `refs/heads/master` while we need only the `master` part to construct the remote cache directory. ### Tickets: - *150744*

### Details: - *item1* - *...* ### Tickets: - *ticket-id* --------- Signed-off-by: Chen Peter <[email protected]>

### Details: - *Change log level to debug* - *...* ### Tickets: - *ticket-id*

) ### Details: - Use OpenVINO-provided flags to enable AVX2 for NPUW ### Tickets: - 136004

### Details: - To avoid cmake warnings when PDPD is not installed

### Details: - Fix data size in test ### Tickets: - [*CVS-148605*](https://jira.devtools.intel.com/browse/CVS-148605)

…notoolkit#26276) ### Details: - *Do not set shape_changed to true when shape is not changed.* - *Fix output layout does not have updated data_padding which is calculated by calc_output_layouts in update_shape* ### Tickets: - *149773*

### Details: - Fix the performance issues in aot_autograd path where constants are being treated as inputs. ### Tickets: - https://jira.devtools.intel.com/browse/CVS-139183

…r fold_multiply_const=true option (openvinotoolkit#26280) ### Details: - The PR restores MarkDequantizationSubgraph transformation behavior to the state before openvinotoolkit#25783. This is required to avoid compressed zero-points constant folding during model conversion. ### Tickets: - *[CVS-150686](https://jira.devtools.intel.com/browse/CVS-150686)*

…t#26288) **Details:** This PR finalizes and allows to infer JAX ViT model **Tickets:** TBD Signed-off-by: Kazantsev, Roman <[email protected]>

…lkit#26291) ### Details: - *Significantly speed up LogSoftmax operation* ### Tickets: - *[148550](https://jira.devtools.intel.com/browse/CVS-148550)*

…t#26289) ### Details: - Add prefix support for PagedAttention operation via existing pa_sdpa_opt kernel by processing subsequence's tokens in sequential mode, one by one for - Moved responsibility for intermediate buffers reallocation from the sdpa_opt kernel to kv_cache_update kernel (they both use the same data, but now one of these buffers can be reused by the pa_sdpa_opt kernel, so everything was moved to one place)

…otoolkit#26016) ### Details: - share the same L0 context from backend to compiler - and perform zeDestroy(context) in backend only ### Tickets: - *ticket-id* --------- Co-authored-by: Xin Wang <[email protected]>

…penvinotoolkit#26251) ### Details: - Bitwise shift operations have support now for int16 an uint16 datatype - added unit and functional test

…vinotoolkit#26231) ### Details: - *item1* - *...* ### Tickets: - CVS-148548 --------- Signed-off-by: Min, Byung-il <[email protected]>

### Tickets: - *148717*

…yout optimizer (openvinotoolkit#25708) ### Details: - Allow Tranpose+Matmul+[Transpose] fusion for static shapes - Allow Transpose in `MoveEltwiseUpThroughDataMov` for specific Transpose -> Eltwise -> MatMul case - Change order of `MoveEltwiseUpThroughDataMov` and `ConvertMatMulToFullyConnected` to simplify callback - Removed custom code for similar fusion in layout optimizer and related debug knob

…sure proper data type propagation (openvinotoolkit#26299) ### Details: This patch adds Validate pass call after IncreasePositionIdsPrecision to ensure proper data type propagation With this change the accuracy of llama-3-8b INT8 (and other LLMs probably) can be restored to expected level Before: ``` | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------|------:|------|-----:|---------------|---|-----:|---|------| |wikitext| 2|none | 0|bits_per_byte |↓ |0.6030|± |N/A | | | |none | 0|byte_perplexity|↓ |1.5189|± |N/A | | | |none | 0|word_perplexity|↓ |9.3472|± |N/A | ``` After: ``` | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------|------:|------|-----:|---------------|---|-----:|---|------| |wikitext| 2|none | 0|bits_per_byte |↓ |0.5351|± |N/A | | | |none | 0|byte_perplexity|↓ |1.4490|± |N/A | | | |none | 0|word_perplexity|↓ |7.2664|± |N/A | ``` ### Tickets: - [CVS-147653](https://jira.devtools.intel.com/browse/CVS-147653)

…envinotoolkit#26295) ### Details: - Use recursive_mutex instead of mutex in compilation context. Because deadlock happens in context compilation(push_tash() -> remove_keys() in single thread) when CPU_PINNING=ON ### Tickets: - 150220

### Details: - *Fix issue with not set decoder type rt_info* - *Remove frontend rt_info from model* ### Tickets: - *ticket-id*

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

…oolkit#26625) Updates the requirements on [jax](https://github.com/google/jax) to permit the latest version. <details> <summary>Release notes</summary> Sourced from <a href="https://github.com/google/jax/releases">jax's releases</a>. <blockquote> <h2>JAX release v0.4.33</h2> This is a patch release on top of jax 0.4.32, that fixes two bugs found in that release. A TPU-only data corruption bug was found in the version of libtpu pinned by JAX 0.4.32, which manifested only if multiple TPU slices were present in the same job, for example, if training on multiple v5e slices. This release fixes that issue by pinning a fixed version of <code>libtpu-nightly</code>. This release also fixes an inaccurate result for F64 tanh on CPU (<a href="https://redirect.github.com/google/jax/issues/23590">#23590</a>). </blockquote> </details> <details> <summary>Changelog</summary> Sourced from <a href="https://github.com/google/jax/blob/main/CHANGELOG.md">jax's changelog</a>. <blockquote> <h2>jax 0.4.33 (September 16, 2024)</h2> This is a patch release on top of jax 0.4.32, that fixes two bugs found in that release. A TPU-only data corruption bug was found in the version of libtpu pinned by JAX 0.4.32, which manifested only if multiple TPU slices were present in the same job, for example, if training on multiple v5e slices. This release fixes that issue by pinning a fixed version of <code>libtpu</code>. This release fixes an inaccurate result for F64 tanh on CPU (<a href="https://redirect.github.com/google/jax/issues/23590">#23590</a>). <h2>jax 0.4.32 (September 11, 2024)</h2> Note: This release was yanked from PyPi because of a data corruption bug on TPU. See the 0.4.33 release notes for more details. <ul> <li> New Functionality <ul> <li>Added {func}<code>jax.extend.ffi.ffi_call</code> and {func}<code>jax.extend.ffi.ffi_lowering</code> to support the use of the new {ref}<code>ffi-tutorial</code> to interface with custom C++ and CUDA code from JAX.</li> </ul> </li> <li> Changes <ul> <li><code>jax_pmap_no_rank_reduction</code> flag is set to <code>True</code> by default. <ul> <li>array[0] on a pmap result now introduces a reshape (use array[0:1] instead).</li> <li>The per-shard shape (accessable via jax_array.addressable_shards or jax_array.addressable_data(0)) now has a leading (1, ...). Update code that directly accesses shards accordingly. The rank of the per-shard-shape now matches that of the global shape which is the same behavior as jit. This avoids costly reshapes when passing results from pmap into jit.</li> </ul> </li> <li><code>jax_enable_memories</code> flag is set to <code>True</code> by default.</li> <li>{mod}<code>jax.numpy</code> now supports v2023.12 of the Python Array API Standard. See {ref}<code>python-array-api</code> for more information.</li> <li>Computations on the CPU backend may now be dispatched asynchronously in more cases. Previously non-parallel computations were always dispatched synchronously. You can recover the old behavior by setting <code>jax.config.update('jax_cpu_enable_async_dispatch', False)</code>.</li> <li>Added new {func}<code>jax.process_indices</code> function to replace the <code>jax.host_ids()</code> function that was deprecated in JAX v0.2.13.</li> <li>To align with the behavior of <code>numpy.fabs</code>, <code>jax.numpy.fabs</code> has been modified to no longer support <code>complex dtypes</code>.</li> <li><code>jax.tree_util.register_dataclass</code> now checks that <code>data_fields</code> and <code>meta_fields</code> includes all dataclass fields with <code>init=True</code> and only them, if <code>nodetype</code> is a dataclass.</li> <li>Several {mod}<code>jax.numpy</code> functions now have full {class}<code>~jax.numpy.ufunc</code> interfaces, including {obj}<code>~jax.numpy.add</code>, {obj}<code>~jax.numpy.multiply</code>, {obj}<code>~jax.numpy.bitwise_and</code>, {obj}<code>~jax.numpy.bitwise_or</code>, {obj}<code>~jax.numpy.bitwise_xor</code>, {obj}<code>~jax.numpy.logical_and</code>, {obj}<code>~jax.numpy.logical_and</code>, and {obj}<code>~jax.numpy.logical_and</code>.</li> </ul> </li> </ul>  </blockquote> ... (truncated) </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/google/jax/commit/80e1c94de63e7f89667cdf35f38d8fe298e97a50"><code>80e1c94</code></a> Prepare for v0.4.33 release.</li> <li><a href="https://github.com/google/jax/commit/1594d2f30fdbfebf693aba4a2b264e4a3e52acc6"><code>1594d2f</code></a> Prepare for v0.4.32 release.</li> <li><a href="https://github.com/google/jax/commit/ed849ff9e0576dcee2514741b5ffa951a94e20a8"><code>ed849ff</code></a> Make sure to call the superclass' init() on a newly created instance in P...</li> <li><a href="https://github.com/google/jax/commit/2bd1fdead81581db08ee84a0d1f82c407ccd6b11"><code>2bd1fde</code></a> Relax test tolerance in pinv test to fix a CI failure on Windows CPU.</li> <li><a href="https://github.com/google/jax/commit/e869a9d65e568e36e95940db302f94f9b7b973c4"><code>e869a9d</code></a> Merge pull request <a href="https://redirect.github.com/google/jax/issues/23415">#23415</a> from kaixih:key_value_seq_lengths</li> <li><a href="https://github.com/google/jax/commit/ea68f4569c5474f20e52b96ab88c287ab843130a"><code>ea68f45</code></a> Internal change</li> <li><a href="https://github.com/google/jax/commit/49dd6ed8d891ee6b7bbfcf7cc425382a7235556b"><code>49dd6ed</code></a> Disable a pallas export compatibility test that fails on TPU v6e.</li> <li><a href="https://github.com/google/jax/commit/808003b4e29e878349192e0f63fa1a2454ace56b"><code>808003b</code></a> Update users of jax.tree.map() to be more careful about how they handle Nones.</li> <li><a href="https://github.com/google/jax/commit/e3c4b20fa04893ad986c3184387fbd3817f1515d"><code>e3c4b20</code></a> [Pallas] Implement tiled and swizzled Memref loads for Mosaic GPU via "GPUBlo...</li> <li><a href="https://github.com/google/jax/commit/c659dc9a011bf8ff604a7e23f916920ff717288b"><code>c659dc9</code></a> [Pallas] Disable win32 gpu_ops_test.</li> <li>Additional commits viewable in <a href="https://github.com/google/jax/compare/jaxlib-v0.1.32...jax-v0.4.33">compare view</a></li> </ul> </details> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ilya Lavrenov <[email protected]> Co-authored-by: Roman Kazantsev <[email protected]>

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

### Tickets: - *151796*

### Details: - enable `CMAKE_COMPILE_WARNING_AS_ERROR` for _intel\_npu_ directory (except _thirdparty_) - remove warning suppression for deprecated declarations - fix existing warnings ### Tickets: - *134706*

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

…ply-by-one pattern (openvinotoolkit#26641) **Details:** In customer model, there is a sub-graph under ShapeOf that is equivalent to multiplication by one. It can be eliminated and make LSTMSequence fusion possible **Ticket:** 149687 Signed-off-by: Kazantsev, Roman <[email protected]>

…penvinotoolkit#26585) ### Tickets: - *151717*

…olkit#26631) ### Details: - the latest available torch version for x86 - 2.2.2 ### Tickets: - *ticket-id*

…toolkit#26638) ### Details: - *Implement extension versions properly for each method* ### Tickets: - *EISW-61724*

…t#26177) ### Details: - The Constant `get_vector` works correctly for low precisions. - Initialize not used bits in Constant buffer for low precisions to avoid undefined values. ### Tickets: - CVS-149867

…vinotoolkit#26597) Co-authored-by: Andrei Kashchikhin <[email protected]>

…oolkit#25997) ### Details: - *This PR adds functional tests for NPUW launched with online partitioning, mostly the same tests that were added for unpartitioned NPUW, except for some interesting ones for folding and pipelining* - *This PR also introduces 1 accuracy test, that, however, checked on simple (in terms of computations, not structure) model for now* ### Tickets: - *ticket-id*

…openvinotoolkit#26662) ### Details: - Enabled parallel call - Fixed tests to be compatible with pytest-xdist plugin (pass function name instead of reference) - `os.path.dirname` caused saving model in `/out` directory instead of `/out/{temp_dir}` beacause it didn't have '/' char at the end and it treated `temp_dir` like a file ### Tickets: - [None](openvinotoolkit#20920) Without parallel execution ![Screenshot from 2024-09-18 14-15-16](https://github.com/user-attachments/assets/f1b00954-de59-445a-904f-5b13819c0971) With parallel execution (8 cpu cores) ![Screenshot from 2024-09-18 14-32-48](https://github.com/user-attachments/assets/2fc144cc-f771-43aa-909b-f41dedc1ccca)

…t#26554) Providing info about support for MXFP4 data format in quantization on CPU. This PR addresses JIRA ticket no. 151042.

port: openvinotoolkit#26529 openvinotoolkit#26659

…otoolkit#26640) ### Details: - support jax.lax.ge and jax.lax.gt operation - create unit tests ### Tickets: - [None](openvinotoolkit#26572) --------- Co-authored-by: Roman Kazantsev <[email protected]>

**Details:** Fix performance inefficiencies **Ticket:** 123298 Signed-off-by: Kazantsev, Roman <[email protected]>

…openvinotoolkit#26501) ### Details: - For fp model, some convolutions may not be compressed to fp16 depending on the transformation policy and those convolutions may have the fused node which is of fp16. Then convolution node input data type will be fp32 while output data type fp16. Convolution needs to support this case. ### Tickets: - 147689

…6599) ### Details: - Target pattern: FCs to be fused by horizontal fusing pass and they have Add users which can be regarded as bias add. Here if we fuse the FCs as is, the fused pattern will be fused_fc -> VariadicSplit -> Add so the Adds cannot be fused to the FCs. - This PR sets such Add users as the FC's bias inputs so that the fused FC can handle them as fused bias. ### Tickets: - CVS-151841

…kit#26660) ### Details: - *Currently, `loops_to_split` in `MHAParallelWAOptimizer` are stored in unordered_map, so elements order is not determined. This sporadically leads to the situation when loop last iteration has work_amount more than main body's increment. This might lead to failures* - *In this PR, 'loops_to_split' are stored in vector, so loop information updates are always called for expanded loop infos in determined order: FIRST_ITER->MAIN_BODY->LAST_ITER* - *Also, the corresponding assert is added to `InsertSpecificIterations::get_decomposed_loop_work_amount` in order to throw an exception on early stage in case of incorrect configuration. This assert also allows to cover the changes by the existing tests (some of them fail if assert is added but the fix is not applied)* ### Tickets: - *N/A*

### Details: - *Could not deserialize RMS node during reading model from cache* - *...* ### Tickets: - *152740*

…oolkit#21414) ### Details: - `ShapeOf` preserve lower bound when upper is infinite ### Tickets: - [CVS-126430](https://jira.devtools.intel.com/browse/CVS-126430)

…kit#26383) ### Details: - Fix for failing GPU functional i16 test. The problem is that i16 input is wrongly converted to f32 in constant and parameter op. - Had to disable i16 case for Deformable conv, which won't work with this fix. THe motivation is that Deformable conv on GPU supports only f16, f32 and int8 types - does not support i16 case, which was working only due to some implicit type transformation which this PR changes.

### Details: - Add guidelines how to test new js api functionality - Add guide how to extend JS API functionality ### Tickets: - [CVS-151489](https://jira.devtools.intel.com/browse/CVS-151489) [CVS-151492](https://jira.devtools.intel.com/browse/CVS-151492) --------- Co-authored-by: Tatiana Savina <[email protected]>

### Details: - *item1* - *...* ### Tickets: - *ticket-id* Co-authored-by: Karol Blaszczak <[email protected]>

… for in rerunner (openvinotoolkit#26691) ### Tickets: - *152565*

xufang-lisa and others added 30 commits August 28, 2024 03:22

[CPU] Fix BatchToSpace size_t underflow for the second dimension (ope…

9c7ac64

…nvinotoolkit#26236) size_t underflow might happen for the second dimension as well, for example when layout is ndhwc. New test cases extended for `ndhwc` layout as well ### Tickets: - 14877

[DOCS] Fix for issues with alignment in Model Caching graphs for mast…

2e5189b

…er (openvinotoolkit#26273) Fixed alignment in model caching graphs.

[GPU] Fix default internal buffers sizes for SDPA (openvinotoolkit#26266

382ac84

) ### Details: - This PR fixes incorrect default buffers sizes configuration for static model in sdpa_opt kernel ### Tickets: - [CVS-150773](https://jira.devtools.intel.com/browse/CVS-150773)

[GPU] out-of-bound access fix for permute_tile_8x8_4x4 (openvinotoolk…

de65193

…it#26267) ### Details: - Fix out-of-bound access from permute kernel - It happened because the kernel loaded with vload even at the boundary ### Tickets: - 150360

[GPU] Do not apply crop optimization for squeeze if the crop axis mat…

91c909e

…ches the squeeze axis (openvinotoolkit#26277) ### Details: - Do not apply crop optimization for squeeze if the crop axis matches the squeeze axis

Fix crash caused by empty string (openvinotoolkit#26250)

c22da4f

### Details: - *item1* - *...* ### Tickets: - *ticket-id* --------- Signed-off-by: Chen Peter <[email protected]>

[CPU] Fix Debug build warnings (openvinotoolkit#26281)

c4b2739

[NPU] Change log level to debug (openvinotoolkit#26282)

c86004e

### Details: - *Change log level to debug* - *...* ### Tickets: - *ticket-id*

[NPUW] Use OpenVINO-provided flags to enable AVX2 (openvinotoolkit#26283

94173d8

) ### Details: - Use OpenVINO-provided flags to enable AVX2 for NPUW ### Tickets: - 136004

Don't call paddle module if it's not found (openvinotoolkit#26286)

44caa93

### Details: - To avoid cmake warnings when PDPD is not installed

[GPU] Fix data size in test (openvinotoolkit#26287)

55723af

### Details: - Fix data size in test ### Tickets: - [*CVS-148605*](https://jira.devtools.intel.com/browse/CVS-148605)

Performance fix for aot_autograd (openvinotoolkit#26139)

91744e1

### Details: - Fix the performance issues in aot_autograd path where constants are being treated as inputs. ### Tickets: - https://jira.devtools.intel.com/browse/CVS-139183

[JAX FE] Support concatenate, integer_pow, reduce ops (openvinotoolki…

090cdd7

…t#26288) **Details:** This PR finalizes and allows to infer JAX ViT model **Tickets:** TBD Signed-off-by: Kazantsev, Roman <[email protected]>

[GPU] Enable activation fusing for softmax_gpu_bf kernel (openvinotoo…

bb33333

…lkit#26291) ### Details: - *Significantly speed up LogSoftmax operation* ### Tickets: - *[148550](https://jira.devtools.intel.com/browse/CVS-148550)*

Fixed extra modules build with GenAI (openvinotoolkit#26290)

9fe6dd5

[GPU]: Bitwise shifts: Added support for uint16 and int16 data type (o…

e9eac15

…penvinotoolkit#26251) ### Details: - Bitwise shift operations have support now for int16 an uint16 datatype - added unit and functional test

[GPU] Add dynamic quantize group size for clDNN Fully-connected (open…

c78fef6

…vinotoolkit#26231) ### Details: - *item1* - *...* ### Tickets: - CVS-148548 --------- Signed-off-by: Min, Byung-il <[email protected]>

[CI] [GHA] Introduce Debian 10 ARM (openvinotoolkit#26156)

fcd1138

### Tickets: - *148717*

[PT FE] Fix rt_info for decoder type (openvinotoolkit#26300)

b2369a9

### Details: - *Fix issue with not set decoder type rt_info* - *Remove frontend rt_info from model* ### Tickets: - *ticket-id*

ilya-lavrenov and others added 29 commits September 17, 2024 13:28

[GHA] Use explicitly 2019 / 2022 toolsets (openvinotoolkit#26627)

704ec9e

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

[DOCS] Fix css (openvinotoolkit#26642)

cd7e0e5

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

[CI] [GHA] Drop Python 3.8 testing in GHA (openvinotoolkit#26586)

d4da8e2

### Tickets: - *151796*

[NPU] Enable warnings as errors (openvinotoolkit#26557)

5f86ffb

### Details: - enable `CMAKE_COMPILE_WARNING_AS_ERROR` for _intel\_npu_ directory (except _thirdparty_) - remove warning suppression for deprecated declarations - fix existing warnings ### Tickets: - *134706*

main (openvinotoolkit#26649)

a570074

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

[CI] [GHA] Add JSON parsing error to errors to look for in rerunner (o…

5bd43f1

…penvinotoolkit#26585) ### Tickets: - *151717*

[GHA][MACOS] Fixed test dependencies to support macos x86 (openvinoto…

62309fb

…olkit#26631) ### Details: - the latest available torch version for x86 - 2.2.2 ### Tickets: - *ticket-id*

[GPU] Support shared compressed weights (openvinotoolkit#26614)

d9d205c

[NPU] Implement extension versions properly for each method (openvino…

77dfb1d

…toolkit#26638) ### Details: - *Implement extension versions properly for each method* ### Tickets: - *EISW-61724*

[core] Improve Constant get_vector for low precisions (openvinotoolki…

b1cd3aa

…t#26177) ### Details: - The Constant `get_vector` works correctly for low precisions. - Initialize not used bits in Constant buffer for low precisions to avoid undefined values. ### Tickets: - CVS-149867

[GHA] Actualize docker docs & add section about reproducibility (open…

f081f09

…vinotoolkit#26597) Co-authored-by: Andrei Kashchikhin <[email protected]>

[DOCS] Documenting the support for MXFP4 data format. (openvinotoolki…

1d3a711

…t#26554) Providing info about support for MXFP4 data format in quantization on CPU. This PR addresses JIRA ticket no. 151042.

Docs release notes 24.4 mstr (openvinotoolkit#26665)

311e2b3

port: openvinotoolkit#26529 openvinotoolkit#26659

[JAX FE]:Support jax.lax.ge and jax.lax.gt operation for JAX (openvin…

6c9d796

…otoolkit#26640) ### Details: - support jax.lax.ge and jax.lax.gt operation - create unit tests ### Tickets: - [None](openvinotoolkit#26572) --------- Co-authored-by: Roman Kazantsev <[email protected]>

[TF FE][SDL] Fix performance inefficiencies (openvinotoolkit#26644)

4ed5ed0

**Details:** Fix performance inefficiencies **Ticket:** 123298 Signed-off-by: Kazantsev, Roman <[email protected]>

[CPU] Could not deserialize the RMS node (openvinotoolkit#26658)

94fe66c

### Details: - *Could not deserialize RMS node during reading model from cache* - *...* ### Tickets: - *152740*

[core] ShapeOf preserve lower bound when upper is infinite (openvinot…

67a4c18

…oolkit#21414) ### Details: - `ShapeOf` preserve lower bound when upper is infinite ### Tickets: - [CVS-126430](https://jira.devtools.intel.com/browse/CVS-126430)

[DOCS] Port benchmark 24.4 (openvinotoolkit#26690)

76fab4c

### Details: - *item1* - *...* ### Tickets: - *ticket-id* Co-authored-by: Karol Blaszczak <[email protected]>

[CI] [GHA] Add lost communication with the server to errors to look…

42b7322

… for in rerunner (openvinotoolkit#26691) ### Tickets: - *152565*

move RMSFusion pass ahead of ConvertPrecision

2b84653

ceciliapeng2011 closed this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU]Move RMSFusion pass ahead of ConvertPrecision #172

[GPU]Move RMSFusion pass ahead of ConvertPrecision #172

ceciliapeng2011 commented Sep 20, 2024

[GPU]Move RMSFusion pass ahead of ConvertPrecision #172

[GPU]Move RMSFusion pass ahead of ConvertPrecision #172

Conversation

ceciliapeng2011 commented Sep 20, 2024