Concurrency testing using a forking approach #222

Alexander-Blair · 2021-02-23T10:28:13Z

The issue: #188

Some discussion on a previous implementation approach on this PR: #221 - this PR is an initial draft to continue the conversation.

Adds a ConcurrentRunner, which uses forking to allow concurrent processing:

Adds the code in a way that does not affect existing APIs (you never know what your users will decide to depend on)
Includes quite a big overhaul of the existing integration test so that we have a fresh topic for each spec, and we also clean up the topics at the end.

Outstanding question(s):

Should there be a hard limit on max concurrency?

dasch · 2021-02-23T13:29:19Z

Try rebasing on master to get the tests working again. Also, would you like to extract the improvements to the test suite to a separate PR and get that merged first?

Alexander-Blair · 2021-02-23T14:04:27Z

Very sensible idea! 👍 #225

dasch

Getting close! Can you also rebase on master to see if that fixes CI?

dasch · 2021-02-24T11:19:40Z

lib/racecar/concurrent_runner.rb

+      #   which case there is nothing more to do.
+      readable_io = IO.select(readers)
+
+      first_read = (readable_io.first & readers).first.read


Can you add a comment to this line, it's not obvious what's being done.

dasch · 2021-02-24T11:20:44Z

lib/racecar/config.rb

@@ -153,6 +153,9 @@ class Config < KingKonf::Config
    desc "Whether to boot Rails when starting the consumer"
    boolean :without_rails, default: false

+    desc "Maximum number of threads to run the application with. Each will spawn its own consumer"
+    integer :max_concurrency, default: 1


This should be changed to processes. Also, I think we should call this parallel rather than concurrent now that we're not using threads. Alternatively, we could perhaps look at what the configs are called for Unicorn and Resque?

Perhaps "workers" is better. Also – it's not really a max, is it? It's the actual number of workers that will be spun up.

Yeah true. What needs to be clear though is that the number of workers * running instances should be less than or equal to the number of partitions in the Kafka topics. Especially if people are running locally and are disappointed when throughput doesn't improve on a single partitioned topic 😄

dasch · 2021-02-26T18:31:23Z

One of the workers somehow receives a shutdown signal - in this case,
we do the same as above, initiating termination and waiting for all of
the workers to shut down

Is there a case to be made for restarting that worker instead, or does that expand the state space too much?

dasch · 2021-02-26T18:37:12Z

lib/racecar/concurrent_runner.rb

+        ready_readers.each(&:close)
+
+        # Recursively wait for the remaining readers
+        wait_for_exit(remaining_readers - ready_readers)


This risks blowing the stack if we go too deep. Probably iterate instead, e.g. until remaining_readers.empty? ... end and update a variable with the remaining readers.

Indeed, this wasn't needed in the end 👍

Alexander-Blair · 2021-03-01T15:29:21Z

Is there a case to be made for restarting that worker instead, or does that expand the state space too much?

I'd maybe like to have this running as an experimental feature, and see if 'quiet' deaths of workers is a common occurrence. From what I can see, exceptions (aside from on startup) are rescued within the Racecar::Runner itself, and baked into the retry capabilities. Perhaps if it turns out the consumers can die on their own (and it turns out to be somewhat common), we can explore the restart capabilities.

dasch

Looks good, ready for beta testing at least 👍

Can you fully document in the README and perhaps the code as well, and also add an entry to the changelog?

dasch

Darn close! Can you make the change to the code and also add an entry in the changelog?

dasch · 2021-03-26T13:24:18Z

lib/racecar/consumer.rb

+        start_from_beginning: true,
+        max_bytes_per_partition: 1048576,
+        additional_config: {},
+        parallel_workers: nil


I don't think it makes sense to make this configuration part of the subscribes_to method – consumer classes can subscribe to multiple topics, but the parallelism is tied to the consumer, not the topic. The class level accessor should suffice; class MyConsumer < ...; self.parallel_workers = 5; end.

dasch

Looks great! Thanks a bunch for putting in all the work!

I can see that the branch needs to be rebased, but after that I'll happily merge :D

When running the specs, failing to do so was causing a consistent segfault when running via Docker (though not on a Macbook!). Upon further investigation, it turned out that: - A producer was created in test 1 (without spinning up parallel workers). The producer was not closed, and remained alive in the main RSpec process - A test was run which did include parallelism, therefore forking the main RSpec process one or more times. At the end of the test, the main process and the forked processes all tried to call the producer's finalizer, resulting in bad times

Each will register as its own member of the Kafka consumer group, and act independently

This is specified per consumer, as this makes more sense than a global configuration option

When running separate instances of the same app, the workers often end up with the same process id, so it's not particularly informative on its own

Alexander-Blair force-pushed the concurrency-testing-forking branch 3 times, most recently from cd5feab to 6f44b73 Compare February 24, 2021 09:44

dasch requested changes Feb 24, 2021

View reviewed changes

Alexander-Blair force-pushed the concurrency-testing-forking branch 12 times, most recently from 1488db5 to d21e3ea Compare February 25, 2021 13:34

dasch reviewed Feb 26, 2021

View reviewed changes

Alexander-Blair force-pushed the concurrency-testing-forking branch 3 times, most recently from ed7dd10 to 5d0a3d6 Compare February 27, 2021 11:43

Alexander-Blair force-pushed the concurrency-testing-forking branch 3 times, most recently from 0c921d7 to d90b989 Compare March 3, 2021 11:07

dasch approved these changes Mar 3, 2021

View reviewed changes

Alexander-Blair force-pushed the concurrency-testing-forking branch from d90b989 to aa613e9 Compare March 5, 2021 20:09

Alexander-Blair force-pushed the concurrency-testing-forking branch from 3c76190 to c245b9a Compare March 17, 2021 15:01

Alexander-Blair force-pushed the concurrency-testing-forking branch from c245b9a to e84af0e Compare March 19, 2021 18:56

dasch requested changes Mar 26, 2021

View reviewed changes

dasch approved these changes Mar 29, 2021

View reviewed changes

Alexander-Blair force-pushed the concurrency-testing-forking branch from 2f18f47 to 1836cc2 Compare March 29, 2021 12:14

Alex Blair added 12 commits March 29, 2021 13:16

Adds runner to fork a number parallel workers

c4e45af

Each will register as its own member of the Kafka consumer group, and act independently

Allow parallel_workers to be configured

bbc1bc1

This is specified per consumer, as this makes more sense than a global configuration option

Use parallel runner when workers exceed 1

9cccb80

Test parallel worker setup in integration spec

28e1c20

Wrap integration specs in a timeout

9ae597a

Allow running of specs from within Docker

965db5e

Document usage of parallel_workers config in consumer class

30e8bed

Use debug logger level for more verbose messages

5370e0e

When running separate instances of the same app, the workers often end up with the same process id, so it's not particularly informative on its own

Separate parallel workers config from subscriptions

46de426

Update CHANGELOG

8c85c11

Document statistics changes from other PR

b2da682

Alexander-Blair force-pushed the concurrency-testing-forking branch from 1836cc2 to b2da682 Compare March 29, 2021 12:18

Alex Blair added 2 commits March 29, 2021 13:22

Increase timeout a bit, just in case

74c9b9b

Document how to run specs within a Docker container

46b3e30

Alexander-Blair force-pushed the concurrency-testing-forking branch from 0365f33 to 46b3e30 Compare March 29, 2021 13:34

dasch merged commit 1c34161 into zendesk:master Mar 29, 2021

Alexander-Blair deleted the concurrency-testing-forking branch March 29, 2021 13:55

malandrina mentioned this pull request Sep 17, 2021

Upgrade to rdkafka 0.10.0 #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency testing using a forking approach #222

Concurrency testing using a forking approach #222

Alexander-Blair commented Feb 23, 2021 •

edited

Loading

dasch commented Feb 23, 2021

Alexander-Blair commented Feb 23, 2021

dasch left a comment

dasch Feb 24, 2021

dasch Feb 24, 2021

dasch Feb 24, 2021

Alexander-Blair Feb 24, 2021

dasch commented Feb 26, 2021

dasch Feb 26, 2021

Alexander-Blair Mar 1, 2021

Alexander-Blair commented Mar 1, 2021

dasch left a comment

dasch left a comment

dasch Mar 26, 2021

dasch left a comment

Concurrency testing using a forking approach #222

Concurrency testing using a forking approach #222

Conversation

Alexander-Blair commented Feb 23, 2021 • edited Loading

dasch commented Feb 23, 2021

Alexander-Blair commented Feb 23, 2021

dasch left a comment

Choose a reason for hiding this comment

dasch Feb 24, 2021

Choose a reason for hiding this comment

dasch Feb 24, 2021

Choose a reason for hiding this comment

dasch Feb 24, 2021

Choose a reason for hiding this comment

Alexander-Blair Feb 24, 2021

Choose a reason for hiding this comment

dasch commented Feb 26, 2021

dasch Feb 26, 2021

Choose a reason for hiding this comment

Alexander-Blair Mar 1, 2021

Choose a reason for hiding this comment

Alexander-Blair commented Mar 1, 2021

dasch left a comment

Choose a reason for hiding this comment

dasch left a comment

Choose a reason for hiding this comment

dasch Mar 26, 2021

Choose a reason for hiding this comment

dasch left a comment

Choose a reason for hiding this comment

Alexander-Blair commented Feb 23, 2021 •

edited

Loading