-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPIKE:Set default index shards to 1 #314
Comments
Needs some infra work to expose number of nodes ideally. Should be a constant called |
Hey team! Please add your planning poker estimate with Zenhub @ferschubert-hm @jerico @kovshenin @mikelittle @sivanovhm @wisyhambolu |
@mikelittle we converted this issue to a spike and estimated it to 5 SP. |
The changes outlined here would only take effect the next time content is reindexed. If indexing is failing due to reaching the shard limit then it should hopefully solve or improve that on reindex, as a reindex is needed anyway.
May need a dedicated primary node but do we have any instances with more than 2 nodes? Will we? I forgot I'd noted this: "Shard count is a number divisible by the number of ES data nodes" - I had meant to also suggest adding an environment variable for the number of nodes in a stack. Or it could be derived and stored by a background task or request run prior to reindexing to set the default and round the target value to the nearest divisible number.
The main trade-off will be ingestion versus read speed. More shards allows for parallel ingestion, fewer shards allows for faster searching, so I would propose the following approach to testing:
The approach can be run and worked out locally but will need to be run on a multi-node test stack in the cloud too. Parallel ingestion may not be that beneficial as we hit memory usage as a limiting factor sometimes, and apart from reindexing ingestion doesn't need to be quick. There are other ways to manage ingestion memory issues too, by adding a short |
The default number of shards in ElasticPress matches the old Elasticsearch default of 5 shards. It's rare our shards reach the kind of size that they really need to be split, and they can still have at least one replica for resilience.
Fewer shards generally improves search time, but lowers data ingestion speed as bulk indexing requests cannot be parallelised.
This seems like an appropriate trade off for site search.
Acceptance criteria:
Answer this questions before going further with the original acceptance criteria
Original acceptance criteria
The text was updated successfully, but these errors were encountered: