Migrating a Running State Chart to a New Version of the State Machine #1338
Replies: 2 comments
-
This isn't really a solution, just my thoughts on this (not so easy) problem:
I think this is part of the key to solving this, but there's more to it than that. The idea of "versioning" state machines is an interesting problem, and I found some prior art/exploration: We can use the example state machines for discussion purposes:
Let's assume that the following events happened:
If we were to "replay" the first two events on Version 2, then the Here's my thoughts, which might be wrong... If we consider that the states represent some process where prerequisite data is collected in order to successfully complete a transaction, we can determine whether a previous state should be allowed to be "grandfathered" in. For example, if a precondition of
is different than what is expected if we were to traverse a path to the {
- approval_code: ...,
payment_details: { ... },
shipment_details: { ... }
} So we know that we can't just "jump" to the We should replay the first two events on version 2, but have a way to defer events that aren't accepted in a state, such as the
Some sort of action should be elevated to the "client-side" of this, saying something like "an approval code is required". In XState, this can be handled with a wildcard transition (not sure if BPMN has the same notion as wildcard events/transitions): on: {
approve: {
target: 'awaiting_payment',
actions: 'recall' // recall all deferred events
},
'*': { actions: ['defer', 'notifyApprovalRequired'] }
} So once the Another thought: in theory, we can compare both versions and generate a diff between them, since they're just directed graphs:
If there are no This can be done automatically with some tooling. I'll have to give it some more thought. Other links I found: |
Beta Was this translation helpful? Give feedback.
-
Very interesting example! Thanks for pointing out these issues, better to confront them now than when we're in the crux of needing to perform one of these migrations with many many actively running machines! I haven't gotten a chance yet to look through the links you posted, will take a look. Figured I'd give a first stab at the problem with just my own common sense as best as I can muster it. Problem Statement: You want to introduce an approval phase in your ordering workflow. How does this impact existing machines? From a product perspective it seems like you could take a couple different strategies:
I drew these diagrams in Lucid Chart to try to figure out what I would do in this scenario: Having this intermediate machine with a deprecated transition would essentially implement product strategy 1. Off-hand that seems like the approach I would imagine product people taking. It is simpler and consistent with the existing behavior. However, one could imagine a scenario where the skipping of the approval step is actually a really bad bug and we don't want to allow any more orders to proceed without approval. This is conceptually similar to the idea of deferred events. One thing that came out of this exercise is it underlines the conviction that event stream as source of truth is a sane approach. You cannot go back and say that the order of events wasn't originally ["created", "payment received"]-- one way or another you have to live with that reality and compensate for it. In both instances, my instinct was to introduce an "in between" state machine for the breaking change between version 1 and version 2. This would operate until all machines that still have the legacy event order have run to completion. In the latter, this was much more complicated, involving creating a superstate that captures history and allows to route back to the approval step and then resume where you left off. Furthermore, this implies that there must be an upgrade batch processing job where the "request approval" event is sent to all in-progress machines that have not completed. It might require that event processing be queued while the batch job completes. -- Obviously this is all much more complicated, however, the appeal to me is that it is explicit. However, if state machines run for a very long time, then the further problem of having to build on top of these intermediate machines that can never be cleaned up would arise. Also, this migration approach raises the question of "when can you clean up the intermediate states and transitions?" You would have to have some way of identifying event streams that are associated with an old machine implementation. Perhaps a concept of machine version that would be captured with the event stream would make this relatively simple. The concept of the compatibility of two machine versions with respect to a given event stream is an interesting concept. If that could be asserted automatically, or even manually with a reasonable number of unit tests, it could be very helpful in being able to make changes with confidence. (Incidentally, will have to look more at this defer / recall pattern. I ran into a situation that sounds very similar in a machine design that is still in progress.) |
Beta Was this translation helpful? Give feedback.
-
We are investigating xstate for use primarily in the back-end in a cloud environment. Our use-cases are somewhat like business process workflows, so we are also looking at a workflow engine called Zeebe (as it is specifically focused on running BPMN 2.0 workflows in internet scale service environments). However, having worked with State Charts in designing reactive behavior... at this point I find State Charts to be more appealing. They are both simpler and more flexible. Furthermore, the apparent lack of a solution to this same question in the Zeebe forum gives me pause.
My question is, is there any community experience with xstate of having a long running state machines where over time the implementation of the state machine changes:
The question is, if we have many state machines in flight at any given time, how can we evolve the state machines themselves over time and pick up these changes, without having to wait for all the old state machines that were launched based on old implementations to simply run to completion?
Based on some prior discussion with this community (thanks for your input!) I have begun thinking of this in terms of Event Sourcing. Essentially, if instead of persisting machine state, I persist the sequence of events with their payloads that led to the current state, then the "current state" of the machine becomes kind of ephemeral. It can always be reconstructed by replaying events against the current implementation of the state machine. (Though this may require custom runner logic to skip over actions and invokes when rehydrating a machine). Things that result from this:
Does any one have any relevant experience to share regarding either evolving the implementation of long-running machines, or using xstate with Event Sourcing inspired machine runners? Or does anyone have alternative solutions to these same problems? It would be very encouraging to hear about prior exploration in this area.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions