chore: limit the amount of context data we parse #684
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have too much data in the context_base table so performance is poor. The data volume is increasing with time so the last 6 months has more data than all before it. This is likely because more users are on newer versions of meltano that send our rich unstructured events and because usage has grown.
I manually truncated the context_base incremental table to remove all data before this year and made a backup table of the original. The table is transient but the backup is not so it will be properly persisted if we ever need that processed historical data. Since the context_base table will continue to grow and we'll have to manually prune it periodically, I created this PR which limits all downstream tables to filter only for 6 months of data so their performance should be relatively static even as the base table grows.