Unconfigured lifecycle state management #47

jginesclavero · 2020-09-29T08:07:08Z

Hi again @norro!

Yesterday, I had a meeting with @chcorbato , and we talked about the case where a lifecycle node transits to ErrorProcessing.
Following the documentation and the lifecycle node diagrams, if a node has an error it transits to ErrorProcessing. Then, based on this processing result, it can go to the Finalized state or Unconfigured state. Do you think that the system_modes must manage the unconfigured state of the lifecycle nodes? This management covers this situation and the start-up situation, where the nodes are in the unconfigured state.

Thank you!

The text was updated successfully, but these errors were encountered:

chcorbato · 2020-09-30T06:44:26Z

Hi again @norro!

Yesterday, I had a meeting with @chcorbato , and we talked about the case where a lifecycle node transits to ErrorProcessing.
Following the documentation and the lifecycle node diagrams, if a node has an error it transits to ErrorProcessing. Then, based on this processing result, it can go to the Finalized state or Unconfigured state. Do you think that the system_modes must manage the unconfigured state of the lifecycle nodes? This management covers this situation and the start-up situation, where the nodes are in the unconfigured state.

Thank you!

This is in the context of our exemplary case of the laser_driver error. We want to elaborate on the layered approach we discussed in the last MROS meeting. This is how I interpret our desired design (please comment if something is not correct or clear):

First the laser_driver code for handling errors tries to recover from the error in the ErrorProcessing transition state.

(from here it is a related but different issue #48)

If it does not succeed (I guess that means node does not transition to Active), the ModeManager tries to recover from the error using the feature/rules. For this, @jginesclavero is adding a rule in the SystemModes file of our system.
If there is no rule, or there is but after applying it the alternative MODE(s) of the laser_driver are not reached either, the ModeManager reports to the Metacontroller that the corresponding (sub)system(s) MODE(s) are not reachable.
(see issue for the continuation of the handling of errors at the higher layers)

norro · 2020-09-30T07:16:51Z

I agree with 1. and 2.
However, the mode manager will not actively report that a certain mode is not available. With #43, however, it will be possible for the meta control to get the information, which modes are available.

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

chcorbato · 2020-09-30T08:02:22Z

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

Very good point indeed, so far we are not accounting for timing issues.
How do we include timing constraints for node management? These could be considered metacontrol requirements for the robotic application:

How should these requirements be defined? Language, relation to MROS metamodel @darkobozhinoski and ontology @rsanz @estherag
Where should they be defined?
I think we should have a discussion about this on the next meeting @darkobozhinoski, ideally with the input of all ROS developers/architects in MROS
@gavanderhoorn @marioney @wasowski @fmrico @jginesclavero @lbajo @ralph-lange

rsanz · 2020-10-06T08:38:02Z

This implies the incorporation of some timestamping and temporal [interval] reasoning. We can incorporate some concepts from e.g. UML2 or UML MARTE.

chcorbato · 2020-10-20T07:58:08Z

However, the mode manager will not actively report that a certain mode is not available. With #43, however, it will be possible for the meta control to get the information, which modes are available.

I agree. So the current design proposal is that Mode Manager just inform about available and reachable modes, and Metacontrol is responsible for inferring from that about the success of reconfiguration actions.
See below for how to model that reasoning.

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

Very good point indeed, so far we are not accounting for timing issues.

How do we include timing constraints for node management? These could be considered metacontrol requirements for the robotic application:

How should these requirements be defined? Language, relation to MROS metamodel @darkobozhinoski and ontology @rsanz @estherag

This implies the incorporation of some timestamping and temporal [interval] reasoning. We can incorporate some concepts from e.g. UML2 or UML MARTE.

@rsanz can you point to the specific concepts?
I think we need to specify some modelling requirements (see below) to evaluate which concepts we need.

Where should they be defined?
I think we should have a discussion about this on the next meeting @darkobozhinoski, ideally with the input of all ROS developers/architects in MROS
@gavanderhoorn @marioney @wasowski @fmrico @jginesclavero @lbajo @ralph-lange

Modelling reqs for reconfiguration actions and timing

Metacontroller needs info on how long a reconfiguration action can take, to decide its success or failure. This depends on:
- type of reconfiguration action: mode change, re-mapping, ~~deploy node~~ (we decided all nodes would be deployed, req for mode manager)
- the node/ susbsystem reconfigured

We could provide this information in the MROS model of the system (Darko's metamodel) as we are doing with the QAs, but I think it is more related to the specific software components that to the application logic.

We could define default values in the MRSO metacontroller to assume when no info is provided.
E.g. assume node mode change takes up to 2secs, and subsystem mode change can take up to 5secs
@jginesclavero @lbajo @marioney @fmrico what numbers are reasonable for navigation2 nodes?

jginesclavero changed the title ~~Unconfigure lifecycle state management~~ Unconfigured lifecycle state management Sep 29, 2020

chcorbato mentioned this issue Sep 30, 2020

Layered handling of node and (sub-)system errors #48

Open

norro closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unconfigured lifecycle state management #47

Unconfigured lifecycle state management #47

jginesclavero commented Sep 29, 2020

chcorbato commented Sep 30, 2020

norro commented Sep 30, 2020

chcorbato commented Sep 30, 2020

rsanz commented Oct 6, 2020

chcorbato commented Oct 20, 2020

Unconfigured lifecycle state management #47

Unconfigured lifecycle state management #47

Comments

jginesclavero commented Sep 29, 2020

chcorbato commented Sep 30, 2020

norro commented Sep 30, 2020

chcorbato commented Sep 30, 2020

rsanz commented Oct 6, 2020

chcorbato commented Oct 20, 2020

Modelling reqs for reconfiguration actions and timing