-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: SurveyCanceller per survey #2337
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from some small comments it looks good to me! ✨
The only "concern" that I have is that I tested locally, and cancelling a respondent was failing due to a data error (I have a particular state of db right now). The thing is that the failure would break the canceller process, so it would keep on starting the process and failing again and again.
The fail was generated by the SurveyCanceller.cancel_respondent
method (particularly the Session.cancel()
call)
Perhaps we should add a try-rescue statement to wrap the Session
methods or the whole cancel_respondent
method, so don't break the process and allow it to keep of cancelling other respondents
:ignore -> | ||
:ignore | ||
end | ||
def surveys_cancelling() do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method belongs to SurveyCancellerSupervisor
- since is the only one using it and the SurveyCanceller
its designed to only deal with one survey - IMHO it can be confusing to have this multi-survey method here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure about where to place this.
I hear what you say, but on the other hand I like the fact that the Supervisor doesn't know the DB or the Surveys at all - it asks what has to be done, and makes sure it gets done.
The fact that the Supervisor is difficult to test (evidenced by the simulate_survey_canceller_start()
helper) tells there's room for improvement, but I don't think this specific issue is one of those.
The Supervisor API expects a set of task descriptions to start, not the actual tasks already started - as we were sending it. This is still broken, since the consumers can't subscribe to the producer if it hasn't started yet - and the app crashes. See #2147
Instead of using consumers and producers, upon app startup we launch a single SurveyCanceller process per each survey. We still have to change the "online" behaviour (when we start cancelling a survey while the app is already started). See #2147
Instead of the old consumers and producers. See #2147
Instead of having a producer and consumers, have a single actor per Survey. Still have to fix/refactor the canceller tests. See #2147
Which may not necessarily be about SurveyCanceller, but this will do for the time being. See #2147
The exit_code and reason are already being set by the SurveyAction See #2337
We've dropped the "respondents" part of the name long ago, and it's part of the runtime actually. See #2337
See #2337
4c049d5
to
c96c29b
Compare
Looks like the log entries are getting stored in different order in the CI, which is probably not really an issue. This commit changes how we check for the last disposition change in the tests, so we avoid asserting on a "response" entry log instead. See #2337
Doesn't really have to do with the current PR, but are failing anyway
If cancelling a respondent raises an error, we log the issue in Sentry and continue cancelling the rest of them. We keep the survey Cancelling so we wait for human intervention. See #2337
The logic for retrying cancelling failing respondents could be improved, but maybe YAGNI so let's start like this first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiiiiice ✨ 🙌 ✨
I left some 💅 nitpicking comments, but of course they are not mandatory
|> Repo.update!() | ||
rescue | ||
error -> | ||
Logger.error(error, __STACKTRACE__, "Error cancelling respondent #{respondent.id}'s session") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 nitpick
"Error cancelling respondent #{respondent.id}'s session
the 's doesn't make sense there - respondent.id is a number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to write "cancelling respondent's session", but wanted to specify their ID. So something like Respondent #25's session
. Not sure how you write that.
The function effectively marks the survey as terminated - the cancellation is the whole process of managing respondents + the survey. See #2147 See #2337 Co-authored-by: Ana Pérez Ghiglia <[email protected]>
The tests were unnecessarily coupled to the Supervisor. We still have lots of improvements to do, but this should make the tests pass. See #2337 See #2147 Co-authored-by: Ana Pérez Ghiglia <[email protected]>
* Fix unrelated typing warning Was laying there since #1842 * Refactor API preparing to receive more survey opts Will need this to duplicate a survey. See #2351 * Add endpoint to duplicate surveys I still have to check if it works with already-running surveys, and panel surveys and other configurations. See #2351 * Expose survey duplicate feature from UI See #2351 * Prevent duplicating Panel Surveys See #2351 * Test duplication of running surveys See #2351 * Test duplicate surveys don't have exit status See #2351 * Fix: SurveyCanceller tests avoid the Supervisor The tests were unnecessarily coupled to the Supervisor. We still have lots of improvements to do, but this should make the tests pass. See #2337 See #2147 Co-authored-by: Ana Pérez Ghiglia <[email protected]> * Fix: add Flow types to new action See #2351 * Don't over-engineer the double-click prevention * Test copying standard quex * Fix running survey duplication test * Comment how quota_buckets_definitions work * Extract common survey creation behaviour See #2351 See #2354 --------- Co-authored-by: Ana Pérez Ghiglia <[email protected]>
The old model of a Supervisor + a Canceller that only started Producers and Consumers was not only overly complex, but also buggy (it failed upon app startup - see #2147).
This refactor makes the Supervisor start a single Canceller process per survey that has to be cancelled.
This model should pave the way to turn the Canceller into a SurveyBroker that manages the survey's complete lifecycle in #2331.
Fixes #2147