Refactor: SurveyCanceller per survey #2337

matiasgarciaisaia · 2024-05-03T15:29:33Z

The old model of a Supervisor + a Canceller that only started Producers and Consumers was not only overly complex, but also buggy (it failed upon app startup - see #2147).

This refactor makes the Supervisor start a single Canceller process per survey that has to be cancelled.

This model should pave the way to turn the Canceller into a SurveyBroker that manages the survey's complete lifecycle in #2331.

Fixes #2147

anaPerezGhiglia

Aside from some small comments it looks good to me! ✨

The only "concern" that I have is that I tested locally, and cancelling a respondent was failing due to a data error (I have a particular state of db right now). The thing is that the failure would break the canceller process, so it would keep on starting the process and failing again and again.
The fail was generated by the SurveyCanceller.cancel_respondent method (particularly the Session.cancel() call)
Perhaps we should add a try-rescue statement to wrap the Session methods or the whole cancel_respondent method, so don't break the process and allow it to keep of cancelling other respondents

lib/ask/survey_respondents_canceller.ex

lib/ask/runtime/survey_canceller_supervisor.ex

lib/ask/survey_respondents_canceller.ex

anaPerezGhiglia · 2024-05-06T18:25:18Z

lib/ask/survey_respondents_canceller.ex

-      :ignore ->
-        :ignore
-    end
+  def surveys_cancelling() do


I think this method belongs to SurveyCancellerSupervisor - since is the only one using it and the SurveyCanceller its designed to only deal with one survey - IMHO it can be confusing to have this multi-survey method here

I'm not really sure about where to place this.

I hear what you say, but on the other hand I like the fact that the Supervisor doesn't know the DB or the Surveys at all - it asks what has to be done, and makes sure it gets done.

The fact that the Supervisor is difficult to test (evidenced by the simulate_survey_canceller_start() helper) tells there's room for improvement, but I don't think this specific issue is one of those.

lib/ask/runtime/survey_canceller_supervisor.ex

lib/ask/survey_respondents_canceller.ex

The Supervisor API expects a set of task descriptions to start, not the actual tasks already started - as we were sending it. This is still broken, since the consumers can't subscribe to the producer if it hasn't started yet - and the app crashes. See #2147

Instead of using consumers and producers, upon app startup we launch a single SurveyCanceller process per each survey. We still have to change the "online" behaviour (when we start cancelling a survey while the app is already started). See #2147

Instead of the old consumers and producers. See #2147

Instead of having a producer and consumers, have a single actor per Survey. Still have to fix/refactor the canceller tests. See #2147

Which may not necessarily be about SurveyCanceller, but this will do for the time being. See #2147

See #2337

The exit_code and reason are already being set by the SurveyAction See #2337

We've dropped the "respondents" part of the name long ago, and it's part of the runtime actually. See #2337

See #2337

Looks like the log entries are getting stored in different order in the CI, which is probably not really an issue. This commit changes how we check for the last disposition change in the tests, so we avoid asserting on a "response" entry log instead. See #2337

Doesn't really have to do with the current PR, but are failing anyway

If cancelling a respondent raises an error, we log the issue in Sentry and continue cancelling the rest of them. We keep the survey Cancelling so we wait for human intervention. See #2337

matiasgarciaisaia · 2024-05-10T20:41:58Z

The logic for retrying cancelling failing respondents could be improved, but maybe YAGNI so let's start like this first.

anaPerezGhiglia

Niiiiiice ✨ 🙌 ✨

I left some 💅 nitpicking comments, but of course they are not mandatory

anaPerezGhiglia · 2024-05-13T12:56:49Z

lib/ask/runtime/survey_canceller.ex

+      |> Repo.update!()
+    rescue
+      error ->
+        Logger.error(error, __STACKTRACE__, "Error cancelling respondent #{respondent.id}'s session")


💅 nitpick

"Error cancelling respondent #{respondent.id}'s session

the 's doesn't make sense there - respondent.id is a number

I was trying to write "cancelling respondent's session", but wanted to specify their ID. So something like Respondent #25's session. Not sure how you write that.

lib/ask/runtime/survey_canceller.ex

test/ask/runtime/survey_broker_test.exs

test/ask/survey_canceller_test.exs

The function effectively marks the survey as terminated - the cancellation is the whole process of managing respondents + the survey. See #2147 See #2337 Co-authored-by: Ana Pérez Ghiglia <[email protected]>

See #2147 See #2337

Since it's shared in at least two different test modules. See #2147 See #2337

The tests were unnecessarily coupled to the Supervisor. We still have lots of improvements to do, but this should make the tests pass. See #2337 See #2147 Co-authored-by: Ana Pérez Ghiglia <[email protected]>

* Fix unrelated typing warning Was laying there since #1842 * Refactor API preparing to receive more survey opts Will need this to duplicate a survey. See #2351 * Add endpoint to duplicate surveys I still have to check if it works with already-running surveys, and panel surveys and other configurations. See #2351 * Expose survey duplicate feature from UI See #2351 * Prevent duplicating Panel Surveys See #2351 * Test duplication of running surveys See #2351 * Test duplicate surveys don't have exit status See #2351 * Fix: SurveyCanceller tests avoid the Supervisor The tests were unnecessarily coupled to the Supervisor. We still have lots of improvements to do, but this should make the tests pass. See #2337 See #2147 Co-authored-by: Ana Pérez Ghiglia <[email protected]> * Fix: add Flow types to new action See #2351 * Don't over-engineer the double-click prevention * Test copying standard quex * Fix running survey duplication test * Comment how quota_buckets_definitions work * Extract common survey creation behaviour See #2351 See #2354 --------- Co-authored-by: Ana Pérez Ghiglia <[email protected]>

matiasgarciaisaia requested a review from anaPerezGhiglia May 3, 2024 15:29

anaPerezGhiglia approved these changes May 6, 2024

View reviewed changes

matiasgarciaisaia added 11 commits May 9, 2024 13:38

Refactor: SurveyCanceller per survey

74d07ab

Instead of using consumers and producers, upon app startup we launch a single SurveyCanceller process per each survey. We still have to change the "online" behaviour (when we start cancelling a survey while the app is already started). See #2147

WIP - Using new SurveyCanceller from controller

8dbe8f8

Instead of the old consumers and producers. See #2147

WIP: Simplify SurveyCanceller per survey

8919f15

Instead of having a producer and consumers, have a single actor per Survey. Still have to fix/refactor the canceller tests. See #2147

Fix SurveyCanceller tests

35d575f

Which may not necessarily be about SurveyCanceller, but this will do for the time being. See #2147

Stop loading the Project for cancelling a survey

fab8f76

See #2337

Don't override exit code and reason after cancel

ece0a25

The exit_code and reason are already being set by the SurveyAction See #2337

Fix SurveyCanceller package and path

eb87acc

We've dropped the "respondents" part of the name long ago, and it's part of the runtime actually. See #2337

Move cancel responsibility to Supervisor

25bcb33

See #2337

Fix imports

e46eb19

See #2337

Document SurveyCancellerSupervisor's behaviour

c96c29b

See #2337

matiasgarciaisaia force-pushed the fix/2147-survey-canceller-startup branch from 4c049d5 to c96c29b Compare May 9, 2024 18:18

matiasgarciaisaia added 3 commits May 9, 2024 15:48

Fix: SurveyBroker tests with log entry order

03be2a3

Doesn't really have to do with the current PR, but are failing anyway

Make canceller resilient

58b539f

If cancelling a respondent raises an error, we log the issue in Sentry and continue cancelling the rest of them. We keep the survey Cancelling so we wait for human intervention. See #2337

matiasgarciaisaia requested a review from anaPerezGhiglia May 10, 2024 20:41

anaPerezGhiglia approved these changes May 13, 2024

View reviewed changes

matiasgarciaisaia and others added 3 commits May 13, 2024 16:16

Rename internal function

863e2df

The function effectively marks the survey as terminated - the cancellation is the whole process of managing respondents + the survey. See #2147 See #2337 Co-authored-by: Ana Pérez Ghiglia <[email protected]>

Document why wait for that many messages in test

a0e53c4

See #2147 See #2337

Extract assertion as test helper

9dc98c3

Since it's shared in at least two different test modules. See #2147 See #2337

matiasgarciaisaia merged commit 827cb10 into main May 17, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: SurveyCanceller per survey #2337

Refactor: SurveyCanceller per survey #2337

matiasgarciaisaia commented May 3, 2024 •

edited

Loading

anaPerezGhiglia left a comment

anaPerezGhiglia May 6, 2024

matiasgarciaisaia May 9, 2024

matiasgarciaisaia commented May 10, 2024

anaPerezGhiglia left a comment

anaPerezGhiglia May 13, 2024

matiasgarciaisaia May 13, 2024

Refactor: SurveyCanceller per survey #2337

Refactor: SurveyCanceller per survey #2337

Conversation

matiasgarciaisaia commented May 3, 2024 • edited Loading

anaPerezGhiglia left a comment

Choose a reason for hiding this comment

anaPerezGhiglia May 6, 2024

Choose a reason for hiding this comment

matiasgarciaisaia May 9, 2024

Choose a reason for hiding this comment

matiasgarciaisaia commented May 10, 2024

anaPerezGhiglia left a comment

Choose a reason for hiding this comment

anaPerezGhiglia May 13, 2024

Choose a reason for hiding this comment

matiasgarciaisaia May 13, 2024

Choose a reason for hiding this comment

matiasgarciaisaia commented May 3, 2024 •

edited

Loading