Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets][Review] Dynamic buffer #201

Conversation

a-sidorova
Copy link
Owner

No description provided.

sshlyapn and others added 6 commits May 27, 2024 06:11
### Details:
 - Added indirect inputs support for SDPA kernel
 - Added setter for causal flag for ScaledDotProductAttention operation
- Added `ov::intel_gpu::hint::enable_sdpa_optimization` to
`ov::supported_properties` list
- Removed unused `TARGET_SEQ_LEN_BLOCK_SIZE > 1` check from kernel for
single token processing
 - Minor refactoring
- Added `OV_GPU_EnableSDPA` debug option (which allows to force SDPA
kernel for any ScaledDotProductAttention operation _(=1)_ / or
completely disable SDPA kernel _(=0)_, ignoring
`ov::intel_gpu::hint::enable_sdpa_optimization` property)

### Tickets:
 - *CVS-141213*
…ith TF framework (openvinotoolkit#24640)

Fix for TEST_DEVICE=CPU comparison of float16 infinite values

### Details:
- Introduced specialization for conversion float16->float32 in the CPU
plugin
 - xfail in layer tests is disabled for CPU device

### Tickets:
 - *24245*
…kit#24691)

LNL does not support 8 as subgroup size. It need to check before
compiling fc_imad shape agnostic kernel.

### Details:
- *Set subgroup size to 16 if simd8 is not supported in the target
device.*
### Details:
 - Coverity Fix as a line of the code is not reachable 
- Remove the line of dead code and add different log message for 3
different possible return

### Tickets:
 - E-125476
…otoolkit#24679)

Add a note about swap according to the comment
openvinotoolkit#24445 (comment)

---------

Co-authored-by: Tatiana Savina <[email protected]>
### Details:
 - *[CPU] [AARCH64] jit gelu erf*

### Tickets:
 - *CVS-138192*
olpipi and others added 11 commits May 27, 2024 10:48
### Details:
 - Remove legacy test util func
 - *...*

### Tickets:
 - [CVS-128261](https://jira.devtools.intel.com/browse/CVS-128261)
…ed functional test for the op. (openvinotoolkit#24611)

This is a follow up to openvinotoolkit#23955

### Details:
 - Added functional test for ROI Align Rotated
- Fixed a "bug" with wrong batch index inside cl kernel revealed by
functional test for ROI Align Rotated.

### Tickets:
 - *[141877](https://jira.devtools.intel.com/browse/CVS-141877)*
… env (openvinotoolkit#24494)

### Details:
 - *Fix broken url link*
 - *Remove unnecessary step in Linux build env*

### Tickets:
 - *N/A*
…inotoolkit#24690)

During the elimination of dependencies from `beam_idx` input and
`ReadValue`(s), we are replacing them by the new PA-related inputs and
sub-expressions dependent on other remaining inputs. In such
replacements we need to guarantee matching shape and element type of old
and new nodes. Before this PR it was not guaranteed for shape and
sometimes a scalar was replaced by a shape of rank 1 that led to errors
like `'start' input is not a scalar`. Now the shape is aligned.

---------

Co-authored-by: Ivan Tikhonov <[email protected]>
Co-authored-by: Ilya Lavrenov <[email protected]>
### Details:
Handled a case when "Other" input to MatMul is 1D


### Tickets:
 - *CVS-141638*
### Details:
- During investigation I found out that it hangs in AUTO plugin but with
directly specified CPU plugin passes.

### Tickets:
 - CVS-141744
…24320)

### Details:
 - *Exception handling: exception message logging*

### Tickets:
- *NotSupported exception can have message, let's display it. For
example this ticket will be clear in this case: CVS-139934*
 - *Part of CVS-142409*
### Details:
 - *Enable select 5d in cl kernel*
 - *Added basic unit test for select 5d*

### Tickets:
 - *140122*
### Details:
 - when tile_size < vector_size then JTIMES == 0

### Tickets:
 - 142398
@a-sidorova a-sidorova force-pushed the feature/snippets/dynamism/buffer_utils_exprs branch 2 times, most recently from f0ae59f to 38b570f Compare May 28, 2024 06:46
ababushk and others added 3 commits May 28, 2024 07:48
…kit#24718)

### Details:
This PR moves Windows Conditional Compilation build to a self-hosted
runner in Azure with 32 cores and 128 GB of RAM
### Details:
 - *item1*
 - *...*

### Tickets:
 - *ticket-id*
### Details:
- *Transferring RoPE operation and transformations for subsequent reuse
in GPU plugin*
@a-sidorova a-sidorova force-pushed the feature/snippets/dynamism/buffer_utils_exprs branch from 38b570f to 5c5e391 Compare May 28, 2024 08:54
chenhu-wang and others added 5 commits May 28, 2024 09:08
…penvinotoolkit#24704)

### Details:
 - *reduce post ops only support f32 precision*

### Tickets:
 - *CVS-142322*
 
### PR for Release:
 - *openvinotoolkit#24724
### Details:
 - Port of openvinotoolkit#24703
- The issue that was present on the release branches does not affect the
master branch yet but it could if, for example, a new folder is added to
the cache path.

### Tickets:
 - *142372*
### Details:
 - *[CPU] [ARM] jit gelu tanh*

### Tickets:
 - *CVS-138293*
…penvinotoolkit#24705)

### Details:
- *To align newly added behavior tests with other plugins,
`OVClassCompiledModelPropertiesDefaultTests` should not be tested with
`HETERO` device, as it is in `intel_cpu` plugin:
https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_cpu/tests/functional/shared_tests_instances/behavior/compiled_model/properties.cpp#L63-L67*

### Tickets:
 - *E-120273*
itikhono and others added 4 commits May 30, 2024 10:33
…otoolkit#24379)

### Details:
 - SliceToStridedSlice was moved from MOC to Common transformations.
 - Updated RoPE fusion patterns
 - 2 new transformations:  EliminateSlice, SliceSequenceToSingleSlice
- Updated EliminateStridedSlice transformation to support int32_max
case, deleted transformation duplicate from CPU
 - Added new unit tests

Tested locally with these models:

- [x] Means that Model compilation passed, and all RoPE related
transformations were applied in the same order
- [x] hf-internal-testing/tiny-random-StableLmForCausalLM
- [x] hf-internal-testing/tiny-random-FalconForCausalLM
- [x] hf-internal-testing/tiny-random-Starcoder2ForCausalLM
- [x] hf-internal-testing/tiny-random-LlamaForCausalLM
- [x] hf-internal-testing/tiny-random-GPTNeoXForCausalLM
- [x] hf-internal-testing/tiny-random-GPTJForCausalLM
- [x] hf-internal-testing/tiny-random-CodeGenForCausalLM
- [x] hf-internal-testing/tiny-random-MistralForCausalLM
- [x] hf-internal-testing/tiny-random-PhiForCausalLM
- [x] Qwen/Qwen1.5-7B
- [x] THUDM/chatglm3-6b
- [x] EleutherAI/gpt-neox-20b
- [x] google/gemma-2b-it
- [x] EleutherAI/gpt-j-6b
- [x] meta-llama/Meta-Llama-3-8B
- [x] mistralai/Mistral-7B-v0.1

### Tickets:
 - *CVS-126971*
…ns (openvinotoolkit#24771)

### Details:
Disable EliminateSqueeze/EliminateUnsqueeze passes in
SymbolicOptimizations

### Tickets:
 - *CVS-141814*

---------

Co-authored-by: Evgenya Nugmanova <[email protected]>
Co-authored-by: Evgenya Nugmanova <[email protected]>
…lkit#24525)

### Details:
- *United `LoopEndStatic` and `LoopEndDynamic` into one node `LoopEnd`
to avoid extra conditions in the code and improve performance since some
pointer data shifts might be known and compiled in JIT code*
- *Removed dynamic aarch64 loop emitters since they don't work anyway
CVS-141550*
- *Added support dynamism to `IdentifyBuffers` and
`DefineBufferClusters`. It's not efficient algorithm since we don't know
exact values of data pointer shifts and cannot be sure that they will be
proportionally in runtime. It should be implemented as the separate
feature based on some judgments, for example*

### Tickets:
 - *141268*


### Prerequisites:
- *openvinotoolkit#21922
[Snippets] Renamed BufferID to BufferRegisterGroup

[Snippets] Changed allocation shape on size

[Snippets] Added Buffer cluster_ID

[Snippets][Tests] Fixed build insert_load_store test

[Snippets] Splited SolveBufferMemory into static and dynamic logic

[Snippets] Rewrote ComputeBufferAllocationSize::get_allocation_size

[Snippets] Added synamism support to InitBuffersDefault

[Snippets][Tests] Added tests for clusters

[Snippets] Added buffer_expressions to ComputeBufferAllocationSize

[Snippets] Added  to LoopInfo for splitted loops:

[Snippets] Removed copy from UpdateLoopInfo

[Snippets] Moved UpdateLoopInfo to RunimeConfigurator

[Snippets] Add dynamic buffers support to Configurator

[Snippets] Fixed Reduce decomp: add shape infer for outputs

[snippets] Fixed broadcast_merge_dim in shape inference

[Snippets][CPU][Tests] Enabled dynamic Softmax tests

[Snippets] Removed useless function calculate_size

[Snippets][CPU][Tests] Enabled dynamic reduce test

[Snippets] Small fixes in solve_buffer_memory for dynamic nodes

[CPU][Snippets] Removed useless emitters LoadConvert and StoreConvert

[Snippets] Added missed consumers cloning

[Snippets][CPU] Added buffer offsets to call_args

[Snippets][CPU] Added dynamic offsets support to load and store emitters

[CPU][UnitTests} Fixed build

[Snippets][AArch64] Fixed build

[Snippets] Small fixes
@a-sidorova a-sidorova force-pushed the feature/snippets/dynamism/buffer_utils_exprs branch from 68e559a to fc39829 Compare May 31, 2024 05:35
@a-sidorova a-sidorova closed this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment