-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possibly add support for multi-core CXUs and big.LITTLE configurations of multi-instance CXUs #1
Comments
OK, that is more expansive than I had been thinking, which was more like a more closely coupled CX extension, with state needing to be context switched - much like an FPU which has FSCR and FREG state. The other part of composable is when there are two CX extensions, and both are used within an application. Context switching needs to be visible to the application in that case, and that sounds painful, unless there is a way to have multiple CX extensions "active" at the same time, which means separate state (which is easy - that state will be separate or shared & visible in any case) and separate opcodes and identification and enabling (which sounds beyond the scope of this) |
[resent, fixing a key late night typo] When the CX TG undertakes its planning milestone, as with many other work scoping decisions, we may decide to support this scenario, or not. The first sentence of the first comment is misleading and will confuse newcomers. It disregards and thus abandons the basis spec abstractions and terminology. In particular, per the basis spec (%1.1, %1.6, %2.1) a CXU may implement multiple CXs plural, and the CX (not CXU) is the extension ISA contract uniquely identified by a GUID. Overlooking this sentence, the issue request is clear. I propose to frame it this way for the CX TG planning milestone:
There are good use cases for (1) including greater throughput, greater capacity (number and/or size of state contexts), and isolation. I think (2) is less compelling but after supporting (1), (2)'s marginal impact is on OSs that must handle more complicated CX Maps and different CX state context blob sizes. This degree of dynamic CX-agnostic state context management, and more, is already anticipated in the basis spec. In terms of the basis spec, supporting (1) and/or (2) has a minor impact, but what impact it has goes up and down the stack.
(CXU*: (newcomers, please disregard): the basis spec uses CXU_ID to identify not a CXU core, but rather a CXU core's implementation of a specific CX. Since a single CXU core may implement A1CX and A2CX, it is assigned two different CXU_IDs, and the CXU interconnect is configured to route requests for either CX_ID to that one CXU core. To better clarify this, the basis spec has four pending renames: 1) rename CX_ID to CX_GUID; 2) rename CXU_ID to CX_ID; 3) rename mcx_selector.cxu_id to mcx_selector.cx_id; 4) rename CXU-LI port req_cxu to req_cx. Then CX API's discovery service maps a requested CX_GUID (globally unique) to a CX_ID (local), if present, or perhaps per this Issue to one of several such CX_IDs. CX_IDs then appear in CX selector values such as mcx_selector.cx_id, which is conveyed on CXU-LI as port req_cx.) It is an open question whether and to what extent (itself resuable and composable) CX library software might cx_open (discover and request access to) a CX but with specific performance or capacity limits hints. On the other hand, I think it would be a mistake to promote fragile CX library coding practices such as providing a cx_open facility to request an explicit CXU implementation or CXU implementation version. |
Hi Allen, I apologize but I am not sure I understand "context switching needs to be visible to the appilcation in that case". The basis spec proposes a way to provide application transparent, CX-agnostic OS context save/restore of any number of CXs and CX state contexts used within the same application. Perhaps we can discuss this question back on the TG list. |
It is possibly my misapprehension that the set of opcodes used by a CXU are
not unique, that they were pre-allocated and an CXU is free to use them
F that is the case, then a context switch (effectively "detaching" CXU-A
opcodes, and "attaching" CXU-B is required to move fro CXU to CXU.
That is not invisible to the application, it must explicitly switch if that
were the case.
IF CXUs never share opcodes - and I don't see how they could , else you
wouldn't need the CX extension at all, you'd just define a custom opcode
So there is something here I'm not understanding.
…On Fri, May 31, 2024 at 3:34 PM Jan Gray ***@***.***> wrote:
Hi Allen,
"The other part of composable is when there are two CX extensions, and
both are used within an application. Context switching needs to be visible
to the application in that case, and that sounds painful, unless there is a
way to have multiple CX extensions "active" at the same time, which means
separate state (which is easy - that state will be separate or shared &
visible in any case) and separate opcodes and identification and enabling
(which sounds beyond the scope of this)"
I apologize but I am not sure I understand "context switching needs to be
visible to the appilcation in that case". The basis spec proposes a way to
provide application transparent, CX-agnostic OS context save/restore of any
number of CXs and CX state contexts used within the same application.
Perhaps we can discuss this question back on the TG list.
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJRWXJKJEOOFASVOHE3ZFD3H7AVCNFSM6AAAAABIRHZOQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTGA3DAOJQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Any given CXU has a single GUID that uniquely identifies its contract, ie the precise new instructions that it implements. However, some use cases may demand having multiple copies of this CXU within a system, accessible by a single hart or shared by multiple harts. Further, each instance of that CXU may differ in its internal configuration, eg amount of state, size/speed of execution engine, etc.
Below, I'll often use the term multi-core CXUs, but I really mean multiple instances (either replication of the same configuration, or replication each with unique configuration/properties) of a single CXU logic module. It is entirely possbile that a CXU internally has multiple execution cores, but that is not what is intended by the discussion in this Issue.
One way to reflext a multi-instance CXU situation might be to see it as a single CXUs through several different stateid's (using the aggregate). However, hardware (CXU-LI and its switch) and system map need to then change how state_id and cxu_id fields are interpreted/implemented within the system.
Once we go multi-instance CXU, it may also become tempting to support big.LITTLE configurations. This matters where the internal CXU implementation can scale independently of the CXU contract. For example, in the RISC-V vector spec, the implementation width and maximum vector length (VLEN) can vary but still implement the same CXU contract. The system map should some how capture details such as the "size" or "scale" or performance level of each CXU instance, allowing the scheduler to (a) know that they are different when making scheduling decisions, and (b) enabling an API to thereby request preference to be scheduled to a "faster core".
There are some further problems to resolve with the big.LITTLE concept. For example, RVV does not support live migration among big. LITTLE cores with different VLEN. However, live migration between RVV instances of different execution widths (but the same VLEN) is possible. This would probably fall under the rules of composability / live migration.
The text was updated successfully, but these errors were encountered: