Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] LSTMSequence and LSTMCell optimization #26767

Open
wants to merge 153 commits into
base: master
Choose a base branch
from

Conversation

michal-miotk
Copy link
Contributor

@michal-miotk michal-miotk commented Sep 24, 2024

Details:

  • creating simple primitive for lstm_sequence to be faster than previous approach using many primitives
  • using oneDNN
  • based on commit c99ddc0 from 25732

Tickets:

  • 146601

commit 232d272f11fbe65e82fa9787260a8b9d34b57d20
Author: michal-miotk <[email protected]>
Date:   Mon Jul 29 11:17:47 2024 +0000

    wip

commit e642ca3
Author: michal-miotk <[email protected]>
Date:   Sun Jul 28 22:08:24 2024 +0000

    wip

commit c6b74d3
Author: michal-miotk <[email protected]>
Date:   Fri Jul 26 14:10:26 2024 +0000

    wip

commit 0451429
Author: michal-miotk <[email protected]>
Date:   Thu Jul 25 20:35:11 2024 +0000

    wip3
commit 1164592
Author: michal-miotk <[email protected]>
Date:   Tue Aug 6 09:25:45 2024 +0000

    wip

commit 8b2c049
Author: michal-miotk <[email protected]>
Date:   Tue Aug 6 09:24:02 2024 +0000

    wip

commit 886b412
Author: michal-miotk <[email protected]>
Date:   Mon Aug 5 14:59:14 2024 +0000

    wip

commit 08fb207
Author: michal-miotk <[email protected]>
Date:   Sun Aug 4 20:21:38 2024 +0000

    wip, errors on half

commit 125884d
Author: michal-miotk <[email protected]>
Date:   Sat Aug 3 23:59:58 2024 +0000

    wip

commit af4f209
Author: michal-miotk <[email protected]>
Date:   Fri Aug 2 17:58:38 2024 +0000

    wip

commit 12626fc
Author: michal-miotk <[email protected]>
Date:   Fri Aug 2 10:52:15 2024 +0000

    wip

commit dfdd052
Author: michal-miotk <[email protected]>
Date:   Thu Aug 1 15:38:41 2024 +0000

    wip

commit 54ee912
Author: michal-miotk <[email protected]>
Date:   Thu Aug 1 11:01:55 2024 +0000

    only bfyx layout

commit 240fe4a
Author: michal-miotk <[email protected]>
Date:   Thu Aug 1 10:34:45 2024 +0000

    two outputs from prim

commit bc775be
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 22:13:14 2024 +0000

    wip

commit d1cfd60
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 22:07:06 2024 +0000

    wip

commit 7d18884
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 19:19:04 2024 +0000

    begin of handling reverse

commit 39f64af
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 15:37:06 2024 +0000

    betterbetter

commit 67b3c9a
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 13:12:39 2024 +0000

    better

commit 6ded5aa
Author: michal-miotk <[email protected]>
Date:   Wed Jul 31 10:12:31 2024 +0000

    wip

commit 1ccdacc
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 23:07:21 2024 +0000

    wip

commit ab1307c
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 22:00:50 2024 +0000

    test passed

commit bc65969
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 15:37:20 2024 +0000

    wip

commit 03cbf57
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 15:15:06 2024 +0000

    only 2 outputs

commit fd5f3dc
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 14:47:12 2024 +0000

    wip

commit 939d23c
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 11:34:56 2024 +0000

    wip

commit 2bb561f
Author: michal-miotk <[email protected]>
Date:   Tue Jul 30 09:28:03 2024 +0000

    added to binary buffer

commit 1ef83ff
Author: michal-miotk <[email protected]>
Date:   Mon Jul 29 22:30:57 2024 +0000

    not works
@p-durandin
Copy link
Contributor

build_jenkins


namespace cldnn {

post_optimize_lstm_weights::post_optimize_lstm_weights(reorder_factory& rf_ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it a part of common post_optimize_weights pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// function which prepares given primitive for weights optimization
template<typename T>
void post_optimize_lstm_weights::optimize_lstm_weights(T& node, program& p) {
//auto offsets = get_weights_bias_offset(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

void reorder_factory::get_weights_split(primitive_id input_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, reorder_factory class doesn't look like a good place for such kind of transformation. Could it be a part of post_optimize_weights pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -267,6 +267,9 @@ struct program {
const ExecutionConfig& config,
std::shared_ptr<ov::threading::IStreamsExecutor> task_executor,
bool is_internal);

static bool has_lstm(topology const& topology);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -18,6 +18,9 @@ typedef std::map<primitive_id, std::shared_ptr<primitive>> topology_map;
struct topology {
public:
using ptr = std::shared_ptr<topology>;

bool lstm_present = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a hack to be honest. As I can see, the only usage of that is setting use_onednn = true if lstm_present, so I think you can just iterate over primitives somewhere in program::init_graph() to check if rnn primitive is present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


template<typename ShapeType>
std::vector<layout> lstm_seq_inst::calc_output_layouts(lstm_seq_node const& node, kernel_impl_params const& impl_param) {
auto desc = impl_param.typed_desc<lstm_seq>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//
#include "lstm_seq_inst.h"
#include "primitive_type_base.h"
#include "intel_gpu/runtime/error_handler.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using of this header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

second_out_fmt = node.get_preferred_output_fmt(1);
third_out_fmt = node.get_preferred_output_fmt(2);
}
return {cldnn::layout{ShapeType{lstm_batch_size, 1, lstm_seq_length, lstm_hidden_size}, input_layout_x.data_type, first_out_fmt}, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is not correct shapes for bidirectional LSTM. Is should probably be num_directions or something like that

Comment on lines 210 to 212
if (_engine.get_device_info().supports_immad) {
_config.set_property(ov::intel_gpu::use_onednn(true));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be handled once in ExecutionConfig::apply_user_properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 66 to 71
#ifdef SEQUENCE
INPUT1_TYPE_VEC initial_block = READ_VEC(0, &initial_hidden_state[INPUT1_GET_INDEX(b, 0, j*VEC_SIZE, 0)]);
INPUT4_TYPE_VEC r_block = READ_VEC(0, &R[INPUT4_GET_INDEX(0, weight_idx, j*VEC_SIZE, 0)]);
#else
INPUT1_TYPE_VEC initial_block = READ_VEC(0, &initial_hidden_state[INPUT1_GET_INDEX(b, j*VEC_SIZE, 0, 0)]);
INPUT4_TYPE_VEC r_block = READ_VEC(0, &R[INPUT4_GET_INDEX(weight_idx, j*VEC_SIZE, 0, 0)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that kind of code can be unified for cell and sequence? Looks like the only difference is indexing, so maybe you can do something like this:

#ifdef SEQUENCE
#define GET_IN0_IDX(b, f, y) INPUT1_GET_INDEX(b, f, y, 0)
#else 
#define GET_IN0_IDX(b, f, y) INPUT1_GET_INDEX(b, y, 0, 0)
#endif


INPUT1_TYPE_VEC initial_block = READ_VEC(0, &initial_hidden_state[GET_IN0_IDX(b, 0, j*VEC_SIZE, 0)]);
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: GPU OpenVINO GPU plugin category: IE Tests OpenVINO Test: plugins and common
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants