Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE:Set default index shards to 1 #314

Open
9 tasks
roborourke opened this issue Nov 8, 2021 · 4 comments
Open
9 tasks

SPIKE:Set default index shards to 1 #314

roborourke opened this issue Nov 8, 2021 · 4 comments
Labels
should have Should be done, medium priority for now spike An investigation needed in order to refine & estimate a story

Comments

@roborourke
Copy link
Contributor

roborourke commented Nov 8, 2021

The default number of shards in ElasticPress matches the old Elasticsearch default of 5 shards. It's rare our shards reach the kind of size that they really need to be split, and they can still have at least one replica for resilience.

Fewer shards generally improves search time, but lowers data ingestion speed as bulk indexing requests cannot be parallelised.

This seems like an appropriate trade off for site search.

Acceptance criteria:
Answer this questions before going further with the original acceptance criteria

  • What happens to existing clusters
  • Ho can we make sure this will work correctly with 3+ nodes
  • How to measure and should we the performance

Original acceptance criteria

  • Default shard count is 1
  • Default shard replica count is 1
  • Default shard count can be configured
  • Shard count config option is documented with guidance on changing it
  • Shard count is a number divisible by the number of ES data nodes
  • Simulate with 3+ nodes
@roborourke
Copy link
Contributor Author

Needs some infra work to expose number of nodes ideally. Should be a constant called ES_NODE_COUNT

@roborourke roborourke added the to refine Issues needing refinement. label Dec 13, 2022
@veselala
Copy link

@veselala veselala added the spike An investigation needed in order to refine & estimate a story label Dec 14, 2022
@veselala
Copy link

@mikelittle we converted this issue to a spike and estimated it to 5 SP.

@veselala veselala added should have Should be done, medium priority for now and removed to refine Issues needing refinement. labels Dec 14, 2022
@veselala veselala changed the title Set default index shards to 1 SPIKE:Set default index shards to 1 Dec 14, 2022
@roborourke
Copy link
Contributor Author

roborourke commented Dec 14, 2022

What happens to existing clusters

The changes outlined here would only take effect the next time content is reindexed. If indexing is failing due to reaching the shard limit then it should hopefully solve or improve that on reindex, as a reindex is needed anyway.

How can we make sure this will work correctly with 3+ nodes

May need a dedicated primary node but do we have any instances with more than 2 nodes? Will we? I forgot I'd noted this: "Shard count is a number divisible by the number of ES data nodes" - I had meant to also suggest adding an environment variable for the number of nodes in a stack. Or it could be derived and stored by a background task or request run prior to reindexing to set the default and round the target value to the nearest divisible number.

GET _cat/nodes will list available nodes.

How to measure and should we the performance

The main trade-off will be ingestion versus read speed. More shards allows for parallel ingestion, fewer shards allows for faster searching, so I would propose the following approach to testing:

  • using a substantial dataset e.g. 100k posts over 100 subsites
  • benchmark indexing speed with 1, 2 and current default of 5 shards
  • benchmark 1000 search requests with 1, 2 and current default of 5 shards

The approach can be run and worked out locally but will need to be run on a multi-node test stack in the cloud too.

Parallel ingestion may not be that beneficial as we hit memory usage as a limiting factor sometimes, and apart from reindexing ingestion doesn't need to be quick. There are other ways to manage ingestion memory issues too, by adding a short sleep or wait in between requests. Will be interesting to see what the results are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
should have Should be done, medium priority for now spike An investigation needed in order to refine & estimate a story
Projects
None yet
Development

No branches or pull requests

2 participants