Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full SME(1) instruction support and STREAMING Groups #415

Open
wants to merge 44 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
cba8cff
Added STREAMING versions of relevant aarch64 instruction groups.
FinnWilkinson May 24, 2024
34c1153
Removed un-used macros from AArch64 Instruction decode.
FinnWilkinson May 28, 2024
687d2a9
Moved aarch64 getGroup logic to instruction_decode.
FinnWilkinson May 28, 2024
49fa390
Moved riscv getGroup logic to instruction_decode.
FinnWilkinson May 28, 2024
4d1acc9
Updated unit tests after changing getGroup logic.
FinnWilkinson May 28, 2024
d13b7cc
Added new AArch64 groups to model config and updated integration test.
FinnWilkinson May 28, 2024
60aeecc
Added streaming mode enabled helper functions.
FinnWilkinson May 28, 2024
89e6b6b
Added STREAMING group logic to instruction_decode, and logic to chang…
FinnWilkinson May 29, 2024
e671cc3
Fixed minor issues with new streaming groups and updated SME example …
FinnWilkinson May 30, 2024
813b013
Re-wrote checkStreamingGroup function.
FinnWilkinson May 30, 2024
4e7c429
Added unit tests for new AArch64 STREAMING groups functionality.
FinnWilkinson May 31, 2024
cae1005
Updated aarch64 groups diagram in docs.
FinnWilkinson May 31, 2024
8352b5a
Added SME instruction FMOPS (S and D) support and regression tests.
FinnWilkinson Aug 13, 2024
b7a991e
Added SME instruction SMOPA (S and D) support and regression tests.
FinnWilkinson Aug 13, 2024
a17b0fd
Added SME instruction SMOPS (S and D) support and regression tests.
FinnWilkinson Aug 13, 2024
e1d2e39
Added SME instructions UMOPA and UMOPS (S and D) support and regressi…
FinnWilkinson Aug 13, 2024
26adf0d
Fix jenkins build error.
FinnWilkinson Aug 14, 2024
377dd99
Added SME instructions SUMOPA and SUMOPS (S and D) support and regres…
FinnWilkinson Aug 14, 2024
7903d46
Updated SUMOPA and SUMOPS tests.
FinnWilkinson Aug 14, 2024
93c3b6c
Added SME instructions USMOPA and USMOPS (S and D) support and regres…
FinnWilkinson Aug 14, 2024
e12ccf1
Fix jenkins build error pt2.
FinnWilkinson Aug 14, 2024
d26ef3a
Implemented SME STR instruction and regression test.
FinnWilkinson Aug 14, 2024
3adc299
Fixed execution logic for vertical ST1D and ST1W SME stores.
FinnWilkinson Aug 14, 2024
e06387b
Implemented SME ST1B and ST1H (H and V) instruction logic.
FinnWilkinson Aug 14, 2024
4cfe0eb
Implemented SME LD1B and LD1H (H and V) instruction logic.
FinnWilkinson Aug 15, 2024
0a3fc93
Added SME LD1B and LD1H regression tests.
FinnWilkinson Aug 15, 2024
a713f44
Updated ST1D and ST1W SME regression tests.
FinnWilkinson Aug 15, 2024
fac70b5
Added SME ST1B and ST1H regression tests.
FinnWilkinson Aug 15, 2024
e906dd1
Implemented SME MOVA (Tile to Vec, horizontal) instructions and regre…
FinnWilkinson Aug 15, 2024
8c2a6bc
Implemented SME MOVA (Tile to Vec, vertical) instructions and regress…
FinnWilkinson Aug 15, 2024
532f9af
Implemented SME MOV (Tile to Vec, vertical and horizontal) instructio…
FinnWilkinson Aug 15, 2024
3b4de2e
Implemented SME MOVA/MOV (Vec to Tile, vertical and horizontal) instr…
FinnWilkinson Aug 16, 2024
fb58957
Implemented SME LDR instruction and regression tests.
FinnWilkinson Aug 16, 2024
c194858
Implemented SME ADDHA and ADDVA (S and D) instructions and regression…
FinnWilkinson Aug 19, 2024
7e5e32c
Updated ADDHA test to make more specific.
FinnWilkinson Aug 20, 2024
e664cc7
Corrected ADDVA execution logic.
FinnWilkinson Aug 20, 2024
a064e9b
Updated ADDVA test to make more specific.
FinnWilkinson Aug 20, 2024
ffed626
Added SME MOVA (tile to vec, vec to tile) Quad-word instructions and …
FinnWilkinson Aug 20, 2024
66e54fd
Implemented SME ST1Q and LD1Q (V and H) instructions and regression t…
FinnWilkinson Aug 28, 2024
762588b
Removed werror.
FinnWilkinson Sep 2, 2024
59d7887
NEON instruction logic fixes.
FinnWilkinson Oct 14, 2024
32948cf
Attended PR comments.
FinnWilkinson Oct 29, 2024
5945bae
Switched order of concatonation for NEON UMAXP instruction to match H…
FinnWilkinson Nov 4, 2024
e15f354
Fixed LD1W (into ZA, 32-bit) buffer overflow error.
FinnWilkinson Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 49 additions & 2 deletions configs/a64fx_SME.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,15 @@ Ports:
- INT_DIV_OR_SQRT
5:
Portname: EAGA
Instruction-Support:
Instruction-Group-Support:
- LOAD
- STORE_ADDRESS
- INT_SIMPLE_ARTH_NOSHIFT
- INT_SIMPLE_LOGICAL_NOSHIFT
- INT_SIMPLE_CMP
6:
Portname: EAGB
Instruction-Support:
Instruction-Group-Support:
- LOAD
- STORE_ADDRESS
- INT_SIMPLE_ARTH_NOSHIFT
Expand All @@ -95,10 +95,24 @@ Ports:
Portname: BR
Instruction-Group-Support:
- BRANCH
# Define example SME / SVE Streaming Mode units
8:
Portname: SME
Instruction-Group-Support:
- SME
9:
Portname: PR_S
Instruction-Group-Support:
- STREAMING_PREDICATE
10:
Portname: FLA_S
Instruction-Group-Support:
- STREAMING_SVE
11:
Portname: FLB_S
Instruction-Group-Support:
- STREAMING_SVE_SIMPLE
- STREAMING_SVE_MUL
Reservation-Stations:
0:
Size: 20
Expand Down Expand Up @@ -133,6 +147,13 @@ Reservation-Stations:
Dispatch-Rate: 1
Ports:
- SME
6:
Size: 40
Dispatch-Rate: 3
Ports:
- FLA_S
- FLB_S
- PR_S
Execution-Units:
0:
Pipelined: True
Expand Down Expand Up @@ -188,6 +209,24 @@ Execution-Units:
- INT_DIV_OR_SQRT
- FP_DIV_OR_SQRT
- SVE_DIV_OR_SQRT
9:
Pipelined: True
Blocking-Groups:
- INT_DIV_OR_SQRT
- FP_DIV_OR_SQRT
- SVE_DIV_OR_SQRT
10:
Pipelined: True
Blocking-Groups:
- INT_DIV_OR_SQRT
- FP_DIV_OR_SQRT
- SVE_DIV_OR_SQRT
11:
Pipelined: True
Blocking-Groups:
- INT_DIV_OR_SQRT
- FP_DIV_OR_SQRT
- SVE_DIV_OR_SQRT
Latencies:
0:
Instruction-Groups:
Expand Down Expand Up @@ -216,9 +255,11 @@ Latencies:
- SCALAR_SIMPLE
- VECTOR_SIMPLE_LOGICAL
- SVE_SIMPLE_LOGICAL
- STREAMING_SVE_SIMPLE_LOGICAL
- SME_SIMPLE_LOGICAL
- VECTOR_SIMPLE_CMP
- SVE_SIMPLE_CMP
- STREAMING_SVE_SIMPLE_CMP
- SME_SIMPLE_CMP
Execution-Latency: 4
Execution-Throughput: 1
Expand All @@ -232,21 +273,25 @@ Latencies:
- SCALAR_SIMPLE_CVT
- VECTOR_SIMPLE
- SVE_SIMPLE
- STREAMING_SVE_SIMPLE
- SME_SIMPLE
- FP_MUL
- SVE_MUL
- STREAMING_SVE_MUL
- SME_MUL
Execution-Latency: 9
Execution-Throughput: 1
7:
Instruction-Groups:
- SVE_DIV_OR_SQRT
- STREAMING_SVE_DIV_OR_SQRT
- SME_DIV_OR_SQRT
Execution-Latency: 98
Execution-Throughput: 98
8:
Instruction-Groups:
- PREDICATE
- STREAMING_PREDICATE
Execution-Latency: 3
Execution-Throughput: 1
9:
Expand All @@ -260,8 +305,10 @@ Latencies:
10:
Instruction-Groups:
- LOAD_SVE
- LOAD_STREAMING_SVE
- LOAD_SME
- STORE_ADDRESS_SVE
- STORE_ADDRESS_STREAMING_SVE
- STORE_ADDRESS_SME
Execution-Latency: 6
Execution-Throughput: 1
Expand Down
Binary file removed docs/sphinx/assets/instruction_groups.png
Binary file not shown.
Binary file modified docs/sphinx/assets/instruction_groups_AArch64.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/include/simeng/Register.hh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once
#include <cstdint>
dANW34V3R marked this conversation as resolved.
Show resolved Hide resolved
#include <iostream>

namespace simeng {

Expand Down
6 changes: 6 additions & 0 deletions src/include/simeng/arch/aarch64/Architecture.hh
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ class Architecture : public arch::Architecture {
/** Returns the current value of SVCRval_. */
uint64_t getSVCRval() const;

/** Returns if SVE Streaming Mode is enabled. */
bool isStreamingModeEnabled() const;

/** Returns if the SME ZA Register is enabled. */
bool isZARegisterEnabled() const;

/** Update the value of SVCRval_. */
void setSVCRval(const uint64_t newVal) const;

Expand Down
9 changes: 9 additions & 0 deletions src/include/simeng/arch/aarch64/Instruction.hh
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,12 @@ class Instruction : public simeng::Instruction {
* processing this instruction. */
InstructionException getException() const;

/** Checks whether the current SVE Streaming Mode status is different to when
* this instruction was first decoded, and updates the instruction group
* accordingly if required.
* Returns TRUE if the group was updated, FALSE otherwise. */
bool checkStreamingGroupAndUpdate();

private:
/** Process the instruction's metadata to determine source/destination
* registers. */
Expand Down Expand Up @@ -451,6 +457,9 @@ class Instruction : public simeng::Instruction {
* the `InsnType` namespace allowing each bit to represent a unique
* identifier such as `isLoad` or `isMultiply` etc. */
uint32_t instructionIdentifier_ = 0;

/** The instruction group this instruction belongs to. */
uint16_t instructionGroup_ = InstructionGroups::NONE;
};

} // namespace aarch64
Expand Down
114 changes: 87 additions & 27 deletions src/include/simeng/arch/aarch64/InstructionGroups.hh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,33 @@ namespace simeng {
namespace arch {
namespace aarch64 {

/** The IDs of the instruction groups for AArch64 instructions. */
/** The IDs of the instruction groups for AArch64 instructions.
* Each new group must contain 14 entries to ensure correct group assignment and
* general functionality.
* Their order must be as follows:
* - BASE
* - BASE_SIMPLE
* - BASE_SIMPLE_ARTH
* - BASE_SIMPLE_ARTH_NOSHIFT
* - BASE_SIMPLE_LOGICAL
* - BASE_SIMPLE_LOGICAL_NOSHIFT
* - BASE_SIMPLE_CMP
* - BASE_SIMPLE_CVT
* - BASE_MUL
* - BASE_DIV_OR_SQRT
* - LOAD_BASE
* - STORE_ADDRESS_BASE
* - STORE_DATA_BASE
* - STORE_BASE
*
* An exception to the above is "Parent" groups which do not require the LOAD_*
* or STORE_* groups.
* "Parent" groups allow for easier grouping of similar groups that may have
* identical execution latencies, ports, etc. For example, FP is the parent
* group of SCALAR and VECTOR.
* In simulation, an instruction's allocated group will never be a "Parent"
* group; they are only used to simplify config file creation and management.
*/
namespace InstructionGroups {
const uint16_t INT = 0;
const uint16_t INT_SIMPLE = 1;
Expand Down Expand Up @@ -72,37 +98,53 @@ const uint16_t LOAD_SVE = 62;
const uint16_t STORE_ADDRESS_SVE = 63;
const uint16_t STORE_DATA_SVE = 64;
const uint16_t STORE_SVE = 65;
const uint16_t PREDICATE = 66;
const uint16_t LOAD = 67;
const uint16_t STORE_ADDRESS = 68;
const uint16_t STORE_DATA = 69;
const uint16_t STORE = 70;
const uint16_t BRANCH = 71;
const uint16_t SME = 72;
const uint16_t SME_SIMPLE = 73;
const uint16_t SME_SIMPLE_ARTH = 74;
const uint16_t SME_SIMPLE_ARTH_NOSHIFT = 75;
const uint16_t SME_SIMPLE_LOGICAL = 76;
const uint16_t SME_SIMPLE_LOGICAL_NOSHIFT = 77;
const uint16_t SME_SIMPLE_CMP = 78;
const uint16_t SME_SIMPLE_CVT = 79;
const uint16_t SME_MUL = 80;
const uint16_t SME_DIV_OR_SQRT = 81;
const uint16_t LOAD_SME = 82;
const uint16_t STORE_ADDRESS_SME = 83;
const uint16_t STORE_DATA_SME = 84;
const uint16_t STORE_SME = 85;
const uint16_t ALL = 86;
const uint16_t NONE = 87;
const uint16_t STREAMING_SVE = 66;
ABenC377 marked this conversation as resolved.
Show resolved Hide resolved
const uint16_t STREAMING_SVE_SIMPLE = 67;
const uint16_t STREAMING_SVE_SIMPLE_ARTH = 68;
const uint16_t STREAMING_SVE_SIMPLE_ARTH_NOSHIFT = 69;
const uint16_t STREAMING_SVE_SIMPLE_LOGICAL = 70;
const uint16_t STREAMING_SVE_SIMPLE_LOGICAL_NOSHIFT = 71;
const uint16_t STREAMING_SVE_SIMPLE_CMP = 72;
const uint16_t STREAMING_SVE_SIMPLE_CVT = 73;
const uint16_t STREAMING_SVE_MUL = 74;
const uint16_t STREAMING_SVE_DIV_OR_SQRT = 75;
const uint16_t LOAD_STREAMING_SVE = 76;
const uint16_t STORE_ADDRESS_STREAMING_SVE = 77;
const uint16_t STORE_DATA_STREAMING_SVE = 78;
const uint16_t STORE_STREAMING_SVE = 79;
const uint16_t SME = 80;
const uint16_t SME_SIMPLE = 81;
const uint16_t SME_SIMPLE_ARTH = 82;
const uint16_t SME_SIMPLE_ARTH_NOSHIFT = 83;
const uint16_t SME_SIMPLE_LOGICAL = 84;
const uint16_t SME_SIMPLE_LOGICAL_NOSHIFT = 85;
const uint16_t SME_SIMPLE_CMP = 86;
const uint16_t SME_SIMPLE_CVT = 87;
const uint16_t SME_MUL = 88;
const uint16_t SME_DIV_OR_SQRT = 89;
const uint16_t LOAD_SME = 90;
const uint16_t STORE_ADDRESS_SME = 91;
const uint16_t STORE_DATA_SME = 92;
const uint16_t STORE_SME = 93;
const uint16_t PREDICATE = 94;
const uint16_t STREAMING_PREDICATE = 95;
const uint16_t LOAD = 96;
const uint16_t STORE_ADDRESS = 97;
const uint16_t STORE_DATA = 98;
const uint16_t STORE = 99;
const uint16_t BRANCH = 100;
const uint16_t ALL = 101;
const uint16_t NONE = 102;
} // namespace InstructionGroups

/** The number of aarch64 instruction groups. */
static constexpr uint8_t NUM_GROUPS = 88;
static constexpr uint8_t NUM_GROUPS = 103;

const std::unordered_map<uint16_t, std::vector<uint16_t>> groupInheritance_ = {
{InstructionGroups::ALL,
{InstructionGroups::INT, InstructionGroups::FP, InstructionGroups::SVE,
InstructionGroups::PREDICATE, InstructionGroups::SME,
InstructionGroups::STREAMING_SVE, InstructionGroups::SME,
InstructionGroups::PREDICATE, InstructionGroups::STREAMING_PREDICATE,
InstructionGroups::LOAD, InstructionGroups::STORE,
InstructionGroups::BRANCH}},
{InstructionGroups::INT,
Expand Down Expand Up @@ -176,6 +218,19 @@ const std::unordered_map<uint16_t, std::vector<uint16_t>> groupInheritance_ = {
{InstructionGroups::SVE_SIMPLE_ARTH_NOSHIFT}},
{InstructionGroups::SVE_SIMPLE_LOGICAL,
{InstructionGroups::SVE_SIMPLE_LOGICAL_NOSHIFT}},
{InstructionGroups::STREAMING_SVE,
{InstructionGroups::STREAMING_SVE_SIMPLE,
InstructionGroups::STREAMING_SVE_DIV_OR_SQRT,
InstructionGroups::STREAMING_SVE_MUL}},
{InstructionGroups::STREAMING_SVE_SIMPLE,
{InstructionGroups::STREAMING_SVE_SIMPLE_ARTH,
InstructionGroups::STREAMING_SVE_SIMPLE_LOGICAL,
InstructionGroups::STREAMING_SVE_SIMPLE_CMP,
InstructionGroups::STREAMING_SVE_SIMPLE_CVT}},
{InstructionGroups::STREAMING_SVE_SIMPLE_ARTH,
{InstructionGroups::STREAMING_SVE_SIMPLE_ARTH_NOSHIFT}},
{InstructionGroups::STREAMING_SVE_SIMPLE_LOGICAL,
{InstructionGroups::STREAMING_SVE_SIMPLE_LOGICAL_NOSHIFT}},
{InstructionGroups::SME,
{InstructionGroups::SME_SIMPLE, InstructionGroups::SME_DIV_OR_SQRT,
InstructionGroups::SME_MUL}},
Expand All @@ -189,11 +244,11 @@ const std::unordered_map<uint16_t, std::vector<uint16_t>> groupInheritance_ = {
{InstructionGroups::LOAD,
{InstructionGroups::LOAD_INT, InstructionGroups::LOAD_SCALAR,
InstructionGroups::LOAD_VECTOR, InstructionGroups::LOAD_SVE,
InstructionGroups::LOAD_SME}},
InstructionGroups::LOAD_STREAMING_SVE, InstructionGroups::LOAD_SME}},
{InstructionGroups::STORE,
{InstructionGroups::STORE_INT, InstructionGroups::STORE_SCALAR,
InstructionGroups::STORE_VECTOR, InstructionGroups::STORE_SVE,
InstructionGroups::STORE_SME}},
InstructionGroups::STORE_STREAMING_SVE, InstructionGroups::STORE_SME}},
{InstructionGroups::STORE_INT,
{InstructionGroups::STORE_ADDRESS_INT, InstructionGroups::STORE_DATA_INT}},
{InstructionGroups::STORE_SCALAR,
Expand All @@ -204,17 +259,22 @@ const std::unordered_map<uint16_t, std::vector<uint16_t>> groupInheritance_ = {
InstructionGroups::STORE_DATA_VECTOR}},
{InstructionGroups::STORE_SVE,
{InstructionGroups::STORE_ADDRESS_SVE, InstructionGroups::STORE_DATA_SVE}},
{InstructionGroups::STORE_STREAMING_SVE,
{InstructionGroups::STORE_ADDRESS_STREAMING_SVE,
InstructionGroups::STORE_DATA_STREAMING_SVE}},
{InstructionGroups::STORE_SME,
{InstructionGroups::STORE_ADDRESS_SME, InstructionGroups::STORE_DATA_SME}},
{InstructionGroups::STORE_ADDRESS,
{InstructionGroups::STORE_ADDRESS_INT,
InstructionGroups::STORE_ADDRESS_SCALAR,
InstructionGroups::STORE_ADDRESS_VECTOR,
InstructionGroups::STORE_ADDRESS_SVE,
InstructionGroups::STORE_ADDRESS_STREAMING_SVE,
InstructionGroups::STORE_ADDRESS_SME}},
{InstructionGroups::STORE_DATA,
{InstructionGroups::STORE_DATA_INT, InstructionGroups::STORE_DATA_SCALAR,
InstructionGroups::STORE_DATA_VECTOR, InstructionGroups::STORE_DATA_SVE,
InstructionGroups::STORE_DATA_STREAMING_SVE,
InstructionGroups::STORE_DATA_SME}}};

} // namespace aarch64
Expand Down
14 changes: 12 additions & 2 deletions src/include/simeng/arch/aarch64/helpers/neon.hh
Original file line number Diff line number Diff line change
Expand Up @@ -568,9 +568,14 @@ RegisterValue vecUMaxP(srcValContainer& sourceValues) {
const T* n = sourceValues[0].getAsVector<T>();
const T* m = sourceValues[1].getAsVector<T>();

// Concatenate the vectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you double-checked the ordering of the concatenation? Ran it on ookami and I think these may be the wrong way round but worth double checking in case I've made a mistake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the spec:

This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register...

i.e. N is concatonated onto the end of M (M:N)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with "what the spec says" vs "observed values", the latter should probably be taken as the truth. So it's worth someone else double-checking that the values I've observed do go against what the spec says

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very odd and confusing... I've also checked on Ookami and Isambard-AI with the following asm programme:

        movi v0.16b, #0
	movi v1.16b, #1
	movi v2.16b, #2

	umaxp v0.16b, v1.16b, v2.16b

	mov w12, v0.s[0]
	mov w13, v0.s[3]

Which after executing yields the following:

  • v0.b = {1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2}
  • v0.s = {16843009, 16843009, 33686018, 33686018}

Which means the concatonation is v1:v2, NOT v2:v1.
I double checked that gdb doesn't display vector registers "in reverse" (i.e. left-hand most element is in fact v0[0] and not v0[15]) using the final two instructions. Their results were:

  • w12 = 16843009
  • w13 = 33686018

So yes, on hardware the concatonation is seemingly vn:vm.

However, the spec and its pseudo code for UMAXP doesn't align with this... From this page, the pseudo code is as follows:

CheckFPAdvSIMDEnabled64();
constant bits(datasize) operand1 = V[n, datasize];
constant bits(datasize) operand2 = V[m, datasize];
bits(datasize) result;
constant bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer max;

for e = 0 to elements-1
    element1 = UInt(Elem[concat, 2*e, esize]);
    element2 = UInt(Elem[concat, (2*e)+1, esize]);
    max = Max(element1, element2);
    Elem[result, e, esize] = max<esize-1:0>;

V[d, datasize] = result;

Where it is clear that the concatonation according to this is vm:vn....

In this instance, we should probably go with hardware. But it is quite annoying that the spec doesn't align with hardware on this, and that updating our code in-line with the spec still fixed the issue that was occuring!

T temp[2 * I];
memcpy(temp, n, sizeof(T) * I);
memcpy(temp + (sizeof(T) * I), m, sizeof(T) * I);
// Compare each adjacent pair of elements
T out[I];
for (int i = 0; i < I; i++) {
out[i] = std::max(n[i], m[i]);
out[i] = std::max(temp[2 * i], temp[2 * i + 1]);
}
return {out, 256};
}
Expand All @@ -585,9 +590,14 @@ RegisterValue vecUMinP(srcValContainer& sourceValues) {
const T* n = sourceValues[0].getAsVector<T>();
const T* m = sourceValues[1].getAsVector<T>();

// Concatenate the vectors
T temp[2 * I];
memcpy(temp, m, sizeof(T) * I);
memcpy(temp + (sizeof(T) * I), n, sizeof(T) * I);

T out[I];
for (int i = 0; i < I; i++) {
out[i] = std::min(n[i], m[i]);
out[i] = std::min(temp[2 * i], temp[2 * i + 1]);
}
return {out, 256};
}
Expand Down
3 changes: 3 additions & 0 deletions src/include/simeng/arch/riscv/Instruction.hh
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,9 @@ class Instruction : public simeng::Instruction {
* the `InsnType` namespace allowing each bit to represent a unique
* identifier such as `isLoad` or `isMultiply` etc. */
uint16_t instructionIdentifier_ = 0;

/** The instruction group this instruction belongs to. */
uint16_t instructionGroup_ = InstructionGroups::NONE;
};

} // namespace riscv
Expand Down
Loading
Loading