Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple worker support #2036

Conversation

habdelra
Copy link
Contributor

@habdelra habdelra commented Jan 13, 2025

This PR enables multiple worker support. It introduces a new mechanism, worker-manager, that is responsible for spinning up the desired number of workers in each container. the worker-manager is also responsible for keeping track of when all the workers are ready using a very simple TCP server that the realm server leverages in the test environment. Additionally, the worker-manager is responsible for tagging the stdout and stderr with the worker's PID so that the log messages among multiple workers can be correlated correctly.

The start:all script spins up the workers automatically. Each realm server has a corresponding worker-manager. However, you can also use start:worker-development to spin up the workers manually if you'd like (which is paired with the start:development script for starting the development realm server). The WORKER_COUNT environment variable is used to indicate how many workers the worker-manager should spin up. By default, when this environment variable is not specified, only a single worker is spun up by the worker-manager.

Finally, this PR also updates the deploy scripts to deploy the workers into their own ECS container. Currently we are only deploying a single worker per ECS container. as soon as we confirm everything is working properly, we'll increase the worker count in our hosted envs in a subsequent PR.

This PR is paired with the infra PR https://github.com/cardstack/infra/pull/568

TODO:

  • Test deploy in staging

Copy link

github-actions bot commented Jan 13, 2025

Host Test Results

    1 files  ±0      1 suites  ±0   21m 47s ⏱️ -20s
728 tests ±0  726 ✔️ ±0  2 💤 ±0  0 ±0 
733 runs  ±0  731 ✔️ ±0  2 💤 ±0  0 ±0 

Results for commit 348026b. ± Comparison against base commit 0b30125.

♻️ This comment has been updated with latest results.

@habdelra habdelra marked this pull request as ready for review January 13, 2025 23:39
@habdelra habdelra requested a review from a team January 13, 2025 23:40
Copy link
Contributor

@lukemelia lukemelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I notice that we are identifying the worker by pid in logging. Might it also be useful to include the hostname?

@habdelra
Copy link
Contributor Author

Looks good. I notice that we are identifying the worker by pid in logging. Might it also be useful to include the hostname?

I think that it will be moot, since it will be cloudwatch that actually renders the logs. Cloudwatch already takes care of that for us.

Copy link
Contributor

@jurgenwerk jurgenwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Perhaps we can consider adding more managerial logic to the manager down the road, for example respawn the worker in case it dies.

@habdelra
Copy link
Contributor Author

Looks good! Perhaps we can consider adding more managerial logic to the manager down the road, for example respawn the worker in case it dies.

That’s a good idea

@habdelra
Copy link
Contributor Author

Looks good! Perhaps we can consider adding more managerial logic to the manager down the road, for example respawn the worker in case it dies.

I like this idea so much i'm adding it to this PR (it's a really simple update)

@habdelra habdelra merged commit f905812 into main Jan 14, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants