Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the split configuration proposal #106

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions proposals/split-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Split Discovery related and Workload related Configuration elements

Currently, Akri's Configuration object contains both elements regarding the discovery of devices (`discoveryDetails`, `brokerProperties`, `discoveryProperties` and `capacity`), and elements regarding the "brokers" that are to be scheduled (`brokerSpec`, `instanceServiceSpec`, `configurationServiceSpec`).
This blurs the responsibility of the resources and make their management more complex.

This proposal divides the `Configuration` object in two that are described in the following sections.

## Note about the scope of objects

The `Configuration` and `Instance` objects are currently namespaced, this makes a lot of sense as the "brokers" will obviously be namespaced and as the `Instances` are linked to their `Configuration` they must share its namespace.
However, when splitting the `Configuration` in two, it makes sense to reconsider that choice as devices are not bound to a namespace per se. This proposal then makes the device description object cluster wide, as well as the `Instance` object.

## Note about naming

The objects, as well as our documentation, currently use the term "broker" to refer to the Pod or Job that get scheduled on every node for every discovered device.
This is ill-named as it can be much more than a mediator, in this proposal, when referring to the new objects, the term "workload" will be used.

## Description of the devices to discover: `DiscoveryConfiguration`

The `DiscoveryConfiguration` object is a cluster-wide object with all information needed to discover devices:

- The name of the discovery handler: `discoveryHandlerName`
- A string that a Discovery Handler knows how to parse to obtain necessary discovery details: `discoveryDetails`
- A set of extra properties the Discovery Handler may need, that can be pulled from `ConfigMaps` or `Secrets` at runtime: `discoveryProperties`
- The number of slots for a device instance: `capacity`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if capacity isn't a workload property? in my mind, it impacts how many workloads can exist at once more than being a reflection on anything discovered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workloadConfiguration describes the workload akri is creating by itself, the capacity describes how many simultaneous users a given device can handle, be it one akri scheduled or one the user scheduled with some other way.
So I believe the capacity is a property of a device in that way, I even wonder if we shouldn't have the capacity in the Instance somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes sense for responsibility distribution, as the "agent" is currently responsible for managing the slots and it should be the one to manage the Discovery Configuration.

The "controller" on the other hand will manage the Workload Configuration.

The Instances will get created by the "agent" and read by the "controller".

I'll add a paragraph in the proposal about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even wonder if we shouldn't have the capacity in the Instance somehow

historically, the Configuration was the place for a user to configure an Instance. if we are splitting Configuration into Discovery and Workload, i think capacity belongs in Workload. If we are introducing a third split (Discovery, Workload, and Instance), i can see it fitting in either Workload or Instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the DiscoveryConfiguration is all about getting Instances created and configured and the WorkloadConfiguration is all about creating the "workload" (or as it is currently known "broker").

This split has multiple ideas behind it:

  • you can use akri with "manual" workload scheduling (as per Requesting Resources) without a WorkloadConfiguration.
  • we get rid of the namespaces for Instances (as it doesn't make sense since the resource is exposed and usable from any namespace as it is exposed by the node resource)

If we have the capacity in the WorkloadConfiguration, then the agent will need to read a WorkloadConfiguration to be able to create the DevicePlugin, so it means you must have a 1:1 relationship between a WorkloadConfiguration and a DiscoveryConfiguration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more aligned with adding capacity to the DiscoveryConfiguration. To me, using the WorkloadConfiguration is an add on to akri and not everyone will use it -- they may apply their own deployments. On the other hand, discovering and specifying a capacity for a device is essential to Akri and should belong in discovery. It is also something that is managed through the device plugin so it should fit in the Agent's control loop which is the discovery one.

- A set of extra properties that will get added to the `Instance` properties and forwarded to workloads using the device: `extraInstancesProperties`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should exist in the workload configuration, since these are environment variables that are set in every workload that uses the device. For example, setting frame rate for a udev camera here doesn't quite make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties are stored at the device plugin level and managed by the agent, I think it can make sense to want to add information about a udev device that is unknown to the discovery handler, but of use to any consumer of the device: e.g. one craft a udev query to select modbus adapters and knows the device behind these uses address 0x42 and this is not discovered by the udev DH, so the DiscoveryConfiguration writer wants this information to be carried to the consumers of the discovered devices.

In the workload configuration, one can just add any additional property through pod/job/whatever env list.


## Description of the kubernetes objects to create: `WorkloadConfiguration`

The `WorkloadConfiguration` object is a namespaced object that will contain the following properties:

- The name of the `DiscoveryConfiguration` whose `Instances` shall trigger the scheduling of the resources described in this `WorkloadConfiguration`: `discoverySelector`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using the term discovery selector instead of "discovery handler"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well here we do not point to a discovery handler, but to a discovery configuration, so the full name should be discoveryConfigurationSelector, I shortened it to discoverySelector to make it simpler.

- The spec part of the workload to schedule: `workloadSpec`
- The optional spec part of the `Service` to spawn pointing to all workload scheduled by this `WorkloadConfiguration`: `configurationServiceSpec`
- The optional spec part of a `Service` to spawn for every `Instance` that triggers this `WorkloadConfiguration`: `instanceServiceSpec`