Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possibly add support for multi-core CXUs and big.LITTLE configurations of multi-instance CXUs #1

Open
ubc-guy opened this issue May 30, 2024 · 4 comments

Comments

@ubc-guy
Copy link

ubc-guy commented May 30, 2024

Any given CXU has a single GUID that uniquely identifies its contract, ie the precise new instructions that it implements. However, some use cases may demand having multiple copies of this CXU within a system, accessible by a single hart or shared by multiple harts. Further, each instance of that CXU may differ in its internal configuration, eg amount of state, size/speed of execution engine, etc.

Below, I'll often use the term multi-core CXUs, but I really mean multiple instances (either replication of the same configuration, or replication each with unique configuration/properties) of a single CXU logic module. It is entirely possbile that a CXU internally has multiple execution cores, but that is not what is intended by the discussion in this Issue.

One way to reflext a multi-instance CXU situation might be to see it as a single CXUs through several different stateid's (using the aggregate). However, hardware (CXU-LI and its switch) and system map need to then change how state_id and cxu_id fields are interpreted/implemented within the system.

Once we go multi-instance CXU, it may also become tempting to support big.LITTLE configurations. This matters where the internal CXU implementation can scale independently of the CXU contract. For example, in the RISC-V vector spec, the implementation width and maximum vector length (VLEN) can vary but still implement the same CXU contract. The system map should some how capture details such as the "size" or "scale" or performance level of each CXU instance, allowing the scheduler to (a) know that they are different when making scheduling decisions, and (b) enabling an API to thereby request preference to be scheduled to a "faster core".

There are some further problems to resolve with the big.LITTLE concept. For example, RVV does not support live migration among big. LITTLE cores with different VLEN. However, live migration between RVV instances of different execution widths (but the same VLEN) is possible. This would probably fall under the rules of composability / live migration.

@allenjbaum
Copy link

OK, that is more expansive than I had been thinking, which was more like a more closely coupled CX extension, with state needing to be context switched - much like an FPU which has FSCR and FREG state.

The other part of composable is when there are two CX extensions, and both are used within an application. Context switching needs to be visible to the application in that case, and that sounds painful, unless there is a way to have multiple CX extensions "active" at the same time, which means separate state (which is easy - that state will be separate or shared & visible in any case) and separate opcodes and identification and enabling (which sounds beyond the scope of this)

@jangray
Copy link
Collaborator

jangray commented May 31, 2024

[resent, fixing a key late night typo]

When the CX TG undertakes its planning milestone, as with many other work scoping decisions, we may decide to support this scenario, or not.

The first sentence of the first comment is misleading and will confuse newcomers. It disregards and thus abandons the basis spec abstractions and terminology. In particular, per the basis spec (%1.1, %1.6, %2.1) a CXU may implement multiple CXs plural, and the CX (not CXU) is the extension ISA contract uniquely identified by a GUID.

Overlooking this sentence, the issue request is clear. I propose to frame it this way for the CX TG planning milestone:

  1. Can a system be configured with multiple identical CXU cores that each implement the same CX?
  2. Can a system be configured with multiple different CXU cores that each implement the same CX?

There are good use cases for (1) including greater throughput, greater capacity (number and/or size of state contexts), and isolation. I think (2) is less compelling but after supporting (1), (2)'s marginal impact is on OSs that must handle more complicated CX Maps and different CX state context blob sizes. This degree of dynamic CX-agnostic state context management, and more, is already anticipated in the basis spec.

In terms of the basis spec, supporting (1) and/or (2) has a minor impact, but what impact it has goes up and down the stack.

  1. As before, a CX selector (%2.1) value differentiates this CXU* from that CXU* via the .cxu_id field of CX selector values.
  2. Assuming two CXUs implementing a common CX are thus treated as two distinct CXUs, the basis spec's proposed Mux and Switch CXUs already route CXU requests by cxu_id / req_cxu. No need to contemplate routing on .state_id / req_state.
  3. Then up stack, the CX API's discovery API, and the CX topology info (CX Map) that powers it, must handle mapping from the canonical CX_ID (GUID) to some system-specific CXU_ID (small integer) that is not a 1:1 corerespondence but 1:many. Not a big deal. The hard work is in getting the CX Map schema right.
  4. For OS context swtiches, having different implementions of a CX in one system need not change anything about how a CX state context is managed, saved, or restored, since the OS should not try to move a state context from one CXU to another CXU. Yes, this does not address big.LITTLE CXU migration. Personally I'm fine with that.

(CXU*: (newcomers, please disregard): the basis spec uses CXU_ID to identify not a CXU core, but rather a CXU core's implementation of a specific CX. Since a single CXU core may implement A1CX and A2CX, it is assigned two different CXU_IDs, and the CXU interconnect is configured to route requests for either CX_ID to that one CXU core. To better clarify this, the basis spec has four pending renames: 1) rename CX_ID to CX_GUID; 2) rename CXU_ID to CX_ID; 3) rename mcx_selector.cxu_id to mcx_selector.cx_id; 4) rename CXU-LI port req_cxu to req_cx. Then CX API's discovery service maps a requested CX_GUID (globally unique) to a CX_ID (local), if present, or perhaps per this Issue to one of several such CX_IDs. CX_IDs then appear in CX selector values such as mcx_selector.cx_id, which is conveyed on CXU-LI as port req_cx.)

It is an open question whether and to what extent (itself resuable and composable) CX library software might cx_open (discover and request access to) a CX but with specific performance or capacity limits hints. On the other hand, I think it would be a mistake to promote fragile CX library coding practices such as providing a cx_open facility to request an explicit CXU implementation or CXU implementation version.

@jangray
Copy link
Collaborator

jangray commented May 31, 2024

Hi Allen,
"The other part of composable is when there are two CX extensions, and both are used within an application. Context switching needs to be visible to the application in that case, and that sounds painful, unless there is a way to have multiple CX extensions "active" at the same time, which means separate state (which is easy - that state will be separate or shared & visible in any case) and separate opcodes and identification and enabling (which sounds beyond the scope of this)"

I apologize but I am not sure I understand "context switching needs to be visible to the appilcation in that case". The basis spec proposes a way to provide application transparent, CX-agnostic OS context save/restore of any number of CXs and CX state contexts used within the same application. Perhaps we can discuss this question back on the TG list.

@allenjbaum
Copy link

allenjbaum commented Jun 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants