Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage] Remove NVMe parameters from front-end that don't belong #206

Merged
merged 1 commit into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions storage/v1alpha1/autogen.md
Original file line number Diff line number Diff line change
Expand Up @@ -973,7 +973,6 @@ Back End (network-facing) APIs. NVMe/TCP and NVMe/RoCEv2 protocols are covered b
| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| subsystem_id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | |
| controller_id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | |
| page_size | [int32](#int32) | | |
| page_token | [string](#string) | | |

Expand Down Expand Up @@ -1141,16 +1140,11 @@ Back End (network-facing) APIs. NVMe/TCP and NVMe/RoCEv2 protocols are covered b
| ----- | ---- | ----- | ----------- |
| id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | namespace's unique key |
| subsystem_id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | subsystem for this namespace |
| controller_id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | key of the PCIe controller object that will host this namespace. |
| host_nsid | [int32](#int32) | | NSID present to the host by the NVMe PCIe controller. If not provided, then the controller will assign an unused NSID within the max namespace range - auto assigned nsid may not work for live migration |
| block_size | [int64](#int64) | | Block size in bytes, must be power of 2 and must be less than the max io size supported. Typically tested values are 512, and 4k. |
| blocks_count | [int64](#int64) | | Size/Capacity of the namespace in blocks, size in bytes will be BlockSize x NumBlocks. |
| nguid | [string](#string) | | Globally unique identifier for the namespace |
| eui64 | [int64](#int64) | | 64bit Extended unique identifier for the namespace mandatory if guid is not specified |
| uuid | [opi_api.common.v1.Uuid](#opi_api-common-v1-Uuid) | | Globally unique identifier for the namespace |
| volume_id | [opi_api.common.v1.ObjectKey](#opi_api-common-v1-ObjectKey) | | The back/middle-end volume to back this namespace. |
| optimal_write_size | [int32](#int32) | | optimal write size hint to host driver. Host IO stack may use this to regulate IO size. Must be a multiple of the preferred write granularity. Must not exceed the controller maximum IO size value configured in the nvme agent config file. |
| pref_write_granularity | [int32](#int32) | | preferred write granularity hint to the host driver. Host IO stack may use this to align IO sizes to the write granularity for optimum performance. |



Expand Down
27 changes: 2 additions & 25 deletions storage/v1alpha1/frontend_nvme_pcie.proto
Original file line number Diff line number Diff line change
Expand Up @@ -195,23 +195,12 @@ message NVMeNamespaceSpec {
// subsystem for this namespace
common.v1.ObjectKey subsystem_id = 2;

// key of the PCIe controller object that will host this namespace.
common.v1.ObjectKey controller_id = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the namespace is associated with controller via subsystem_id?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see mrvl_nvm_ctrlr_attach_ns uses Ctrl_id here PDF

question : how we accommodate this ? we can create new methods in OPI for attach/detach ns #177
or we associate controller with NS in a different way ? thoughts ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #177 I was proposing that we do not need to support attaching namespaces to individual controllers (for the NVMe front-end). We only need to support attaching namespaces to subsystems. That means mrvl_nvm_ctrlr_attach_ns isn't ever used (you use mrvl_nvm_subsys_alloc_ns instead)

Attaching a namespace to a subsystem, rather than to a controller, is the "normal" thing to do in NVMe. Attaching a namespace to just one controller is the "weird corner case you can technically do but probably shouldn't" case, and it's likely best modeled as making a subsystem-attached namespace "visible" to a controller, rather than attaching it to a controller outright. This visibility configuration is what the NVM attach/detach namespace commands do in the spec - the namespace is already attached to the subsystem and those commands set up visibility rules for a given controller.

From our past discussions, I'm worried that this particular point isn't well understood and that most people are thinking the normal thing to do is to attach a namespace to a controller. Are we aligned on this particular point or does this need some dedicated discussion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, just few more questions. thanks for patience @benlwalker
so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL namespaces at once ? I don;t like to be forced to create multiple subsystems... for example for multipathing purposes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @benlwalker for describing the 'attach' semantics.
As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed a bit on Slack but to recap

so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL namespaces at once

This is a true statement, but you would not want to create a single subsystem for all controllers in any design. I've asked for use cases where you would want to do that and I haven't heard any. I think we need to get to some consensus that using a single subsystem for all controllers is a design mistake. Maybe on the call today or here we can try to make some forward progress.

As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?

I'm not very confident I understand what you're saying here. But yes, for a multitude of reasons you do want to create a subsystem per tenant, which is usually per controller. It's the only way to support live migration, provide the right performance isolation guarantees (against resets mostly), not leak information about other tenants, and get the NN value suitable for booting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that between tenants you want to create separate subsystems. But inside the same tenant ? single ? several ?

I think maybe @jainvipin says that creating many subsystem if a heavy burden in DPU resources, so you might hit a limit... ?

the use case we talked about is multipathing with 2 ports, there you need 2 devices instead of 4 present to the host...

Copy link
Contributor

@jainvipin jainvipin Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glimchb - I am okay configuring multiple subsystems to have multiple controllers to refer to a different namespace. This way removal of reference from Controller to namespace is not a problem.


// NSID present to the host by the NVMe PCIe controller.
// If not provided, then the controller will assign an unused NSID
// within the max namespace range - auto assigned nsid may not work
// for live migration
int32 host_nsid = 4;

// Block size in bytes, must be power of 2 and must be less than the max
// io size supported. Typically tested values are 512, and 4k.
int64 block_size = 5;

// Size/Capacity of the namespace in blocks, size in bytes will
// be BlockSize x NumBlocks.
int64 blocks_count = 6;

// Globally unique identifier for the namespace
string nguid = 7;

Expand All @@ -224,17 +213,6 @@ message NVMeNamespaceSpec {

// The back/middle-end volume to back this namespace.
common.v1.ObjectKey volume_id = 10;

// optimal write size hint to host driver. Host IO stack may use
// this to regulate IO size. Must be a multiple of the preferred write
// granularity. Must not exceed the controller maximum IO size value
// configured in the nvme agent config file.
int32 optimal_write_size = 11;

// preferred write granularity hint to the host driver. Host IO
// stack may use this to align IO sizes to the write granularity for
// optimum performance.
int32 pref_write_granularity= 12;
}

message NVMeNamespaceStatus {
Expand Down Expand Up @@ -329,9 +307,8 @@ message UpdateNVMeNamespaceRequest {

message ListNVMeNamespaceRequest {
common.v1.ObjectKey subsystem_id = 1;
common.v1.ObjectKey controller_id = 2;
int32 page_size = 3;
string page_token = 4;
int32 page_size = 2;
string page_token = 3;
}

message ListNVMeNamespaceResponse {
Expand Down
573 changes: 187 additions & 386 deletions storage/v1alpha1/gen/cpp/frontend_nvme_pcie.pb.cc

Large diffs are not rendered by default.

Loading