You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When rebuilding the entire index, it takes a long time as each manifest must be fully parsed and rebuilt. However, many of these manifests may not have changed since the last time a rebuild was run. With nearly 60,000 manifests, it would be beneficial to have some method of doing a partial rebuild.
Proposed technical implementation details
When a rebuild is performed, a copy of the manifests and the indexes could be saved off to a storage blob as a gzip. When the next rebuild is performed, this gzip could be downloaded and expanded, and the indexes loaded into memory as if it were the publishing pipeline. Then, instead of rebuilding the index from scratch, each manifest could be compared. If the manifest has changed, then update the index based upon the diff from the old manifest file to the new manifest file. If there was no change in the manifest, the index does not need to be updated. Once all the manifests have been processed, the new indexes can be published and a copy of the manifests and indexes can be saved off as the cache for the next rebuild.
Of course the pipelines will still need to have an option to perform a full rebuild, if necessary, but adding a caching layer could significantly reduce the amount of time it takes by starting from the last known-good index.
With this caching strategy, it could also be beneficial to perform a rebuild on a regular cadence (every 3 months?) to help ensure a well-maintained cache.
The text was updated successfully, but these errors were encountered:
I'll leave that up to @denelon, but considering that the index creation is part of the CLI implementation, I had opted to put it here, mostly for planning purposes within the team; Especially since the rebuild pipeline isn't typically run as a regular part of verification/publishing
Description of the new feature / enhancement
When rebuilding the entire index, it takes a long time as each manifest must be fully parsed and rebuilt. However, many of these manifests may not have changed since the last time a rebuild was run. With nearly 60,000 manifests, it would be beneficial to have some method of doing a partial rebuild.
Proposed technical implementation details
When a rebuild is performed, a copy of the manifests and the indexes could be saved off to a storage blob as a gzip. When the next rebuild is performed, this gzip could be downloaded and expanded, and the indexes loaded into memory as if it were the publishing pipeline. Then, instead of rebuilding the index from scratch, each manifest could be compared. If the manifest has changed, then update the index based upon the diff from the old manifest file to the new manifest file. If there was no change in the manifest, the index does not need to be updated. Once all the manifests have been processed, the new indexes can be published and a copy of the manifests and indexes can be saved off as the cache for the next rebuild.
Of course the pipelines will still need to have an option to perform a full rebuild, if necessary, but adding a caching layer could significantly reduce the amount of time it takes by starting from the last known-good index.
With this caching strategy, it could also be beneficial to perform a rebuild on a regular cadence (every 3 months?) to help ensure a well-maintained cache.
The text was updated successfully, but these errors were encountered: