-
Notifications
You must be signed in to change notification settings - Fork 360
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
34b1db8
commit 093525d
Showing
3 changed files
with
15 additions
and
45 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,49 +12,36 @@ The architecture contains 4 components: | |
|
||
Note that Reacher provides the same Docker image `reacherhq/backend` which can act as both a **Worker** and a **HTTP server**. | ||
|
||
<figure><img src="../.gitbook/assets/Screenshot 2024-11-27 at 14.43.50.png" alt=""><figcaption><p>Reacher architecture for scaling</p></figcaption></figure> | ||
<figure><img src="../.gitbook/assets/Screenshot 2024-11-30 at 15.33.27.png" alt=""><figcaption><p>Reacher queue architecture</p></figcaption></figure> | ||
|
||
With this architecture, it's possible to horizontally scale the number of workers, while making sure that the individual IPs don't get blacklisted. To do so, we propose to start with two types of workers. | ||
With this architecture, it's possible to horizontally scale the number of workers. However, to prevent spawning to many workers at once resulting in blacklisted IPs, we need to configure some concurrency and throttling parameters below. | ||
|
||
### Shared Configuration between both workers | ||
### Worker Configuration | ||
|
||
To enable the above worker architecture, set the following parameters in [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention"):  | ||
To enable the above worker architecture without getting blacklisted, we need to set some parameters in [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention"): | ||
|
||
* `worker.enable`: true | ||
* `worker.rabbitmq.url`: Points to the URL of the RabbitMQ instance. | ||
* `worker.postgres.db_url`: A Postgres database to store the email verification results. | ||
|
||
### 1st worker type: SMTP worker using Proxy | ||
Since spawning workers (generally on cloud providers) doesn't guarantee a reputable IP assigned to the worker, we propose to configure all workers to use a proxy. Proxies generally offer a pricing per IP per month; we recommend buying one IP for each 10000 email verifications you do per day. | ||
|
||
These workers will consume all emails that should be verified through SMTP. Currently, this includes all emails, except Hotmail B2C and Yahoo emails, which are best verified using a headless navigator. Since maintaing IP addresses is hard, we recommend using a proxy, see [proxies.md](proxies.md "mention"). | ||
* `worker.proxy.{host,port}`: Set a proxy to route all SMTP requests through. You can optionally pass in `username` and `password` if required. | ||
|
||
Assuming your proxy has `N` available IP addresses, we recommend spawning the same number `N` of workers, each with the config below: | ||
We also propose some recommended values for concurrency and throttling parameters. These parameters ensure that the proxy that we use will have its IP well maintained. | ||
|
||
* `worker.rabbitmq.queues`: `["check.gmail","check.hotmailb2b","everything_else"]`. The SMTP workers will listen to these queues. | ||
* `worker.proxy.{host,port}`: Set a proxy to route all SMTP requests through. You can optionally pass in `username` and `password` if required. | ||
* `worker.rabbitmq.concurrency`: 10. | ||
* `worker.throttle.max_requests_per_minute`: 100. | ||
* `worker.throttle.max_requests_per_day`: 10000. This is the recommended number of verifications per IP per day. Assuming there are `N` IP addresses and `N` workers, each worker should perform 10000 verifications per day. | ||
* `worker.rabbitmq.concurrency`: 5. Each worker can process 5 emails at a time. | ||
* `worker.throttle.max_requests_per_minute`: 60. If this value is too high, the recipient SMTP server might see sudden spikes of email verifications, resulting in an IP blacklist. | ||
* `worker.throttle.max_requests_per_day`: 10000. This is the recommended number of verifications per IP per day. Assuming our proxy has `N` IP addresses and `N` workers, each worker will perform 10000 verifications per day in average. | ||
|
||
You can scale up the number `N` as much as you need. Remember, the rule of thumb is 10000 verifications per IP per day. For example, if you're aiming for 10 millions verifications per month, we recommend 33 or 34 IPs. | ||
You can scale up the number `N` as much as you need, by buying more IPs and spawning more workers. Remember, the rule of thumb is 10000 verifications per IP per day. For example, if you're aiming for 10 millions verifications per month, we recommend buying 33 or 34 IPs: | ||
|
||
``` | ||
10,000,000 emails per month / 30 = 33,000 emails per day / 10000 = 33 IPs | ||
``` | ||
|
||
Refer to [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention")to see how to set these settings. | ||
|
||
### 2nd worker type: Headless worker | ||
|
||
These workers will consume all emails that are best verified using a headless browser. The idea behind this verification method is to spawn a headless browser that will navigate to the email provider's password recovery page, and parse the website's response to inputting emails. This method currently works well for Hotmail and Yahoo emails. | ||
|
||
To spawn such a worker, provide the config: | ||
|
||
* `worker.rabbitmq.queues`: `["check.hotmailb2c","check.yahoo"]`. These are the emails that are best verified using headless. | ||
* `worker.throttle.max_requests_per_minute`: 100 | ||
|
||
Refer to [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention")to see how to set these settings. | ||
|
||
## Understanding the architecture with Docker Compose | ||
|
||
We do not recommend using Docker Compose for a high-volume production setup. However, for understanding the architecture, the different Docker images, as well as how to configure the workers, this [`docker_compose.yaml`](../../docker-compose.yaml) file can be useful. | ||
|
@@ -64,4 +51,4 @@ We do not recommend using Docker Compose for a high-volume production setup. How | |
Contact [[email protected]](https://app.gitbook.com/u/F1LnsqPFtfUEGlcILLswbbp5cgk2 "mention")if you have more questions about this architecture, such as: | ||
|
||
* deploying on Kubernetes (Ansible playbook, Pulumi) | ||
* more specialized workers (e.g. Gmail and Hotmail B2B workers can be separated) | ||
* more specialized workers (e.g. some workers doing headless verification only, others doing SMTP only) |