Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow collections #2569

Merged
merged 9 commits into from
Oct 15, 2024
Merged

Add workflow collections #2569

merged 9 commits into from
Oct 15, 2024

Conversation

jyeshe
Copy link
Member

@jyeshe jyeshe commented Oct 9, 2024

Description

This PR adds workflow collections which are scoped to projects and provide similar functionality to Redis HSET and HGET (named hashes operations). It will be used not only to share mutable data between jobs on downstream but also shared among multiple workflows and it will be the responsibility of the programmer to use it without conflicts between workflows.

EDIT: The PRs are "merged" and it also includes functionalities similar to HGETALL and HSCAN allowing a query by pattern using a cursor (and respecting a limit).

Closes #2550
Closes #2551
Closes #2552

Validation steps

There is not UI using it yet but a complete test coverage is provided.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

Pre-submission checklist

  • I have performed a self-review of my code.
  • I have implemented and tested all related authorization policies. (e.g., :owner, :admin, :editor, :viewer)
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

These are named and shared key-value storages
add :collection_id,
references(:collections, on_delete: :delete_all, type: :binary_id, null: false)

timestamps()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to timestamps(type: :naive_datetime_usec) and the models as well.

Probably a good idea to start with sub-second resolution on these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

timestamps()
end

create unique_index(:collections_items, [:collection_id, :key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, do you think we should put an index on updated_at and inserted_at as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stuartc I would like to play a bit more with the BRIN indexes because I am trying to avoid adding more cost to the job execution. May we add it during the "query by filter" PR? Will have better stats.

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.47%. Comparing base (534c6f5) to head (e276fbd).
Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2569      +/-   ##
==========================================
+ Coverage   90.44%   90.47%   +0.02%     
==========================================
  Files         316      319       +3     
  Lines       10926    10955      +29     
==========================================
+ Hits         9882     9911      +29     
  Misses       1044     1044              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jyeshe jyeshe requested a review from stuartc October 9, 2024 14:37
@jyeshe jyeshe self-assigned this Oct 9, 2024
@jyeshe jyeshe marked this pull request as ready for review October 9, 2024 16:23
@jyeshe jyeshe force-pushed the collections branch 3 times, most recently from d1cf44a to f3872e9 Compare October 9, 2024 17:31
@jyeshe jyeshe marked this pull request as draft October 10, 2024 09:44
@jyeshe jyeshe force-pushed the collections branch 2 times, most recently from bf0f894 to cf579c0 Compare October 10, 2024 13:50
@josephjclark
Copy link
Contributor

Thank you for the AI disclosure @jyeshe

@jyeshe jyeshe force-pushed the collections branch 3 times, most recently from 1c4c60f to d160c17 Compare October 14, 2024 16:30
@jyeshe jyeshe force-pushed the collections branch 2 times, most recently from 7c331ed to 9d6bca4 Compare October 15, 2024 08:04
@jyeshe jyeshe marked this pull request as ready for review October 15, 2024 09:57
@stuartc stuartc changed the base branch from main to feature/collections October 15, 2024 13:04
@stuartc stuartc merged commit 5dc12d5 into feature/collections Oct 15, 2024
6 of 7 checks passed
@stuartc stuartc deleted the collections branch October 15, 2024 13:08
jyeshe added a commit that referenced this pull request Oct 22, 2024
* Add workflow collections

These are named and shared key-value storages

* Increase timestamp precision

* Use single return patter on same get function

* Optimize all ops once the names are stable

There is no user update on collection name

* Changelog and formatting

* Add stream_all allowing equivalent queries to HGETALL and HSCAN

* Add stream_match for wildcard prefix queries

* Add pg_trigram and GIN index on the key to allow multi wildcard
jyeshe added a commit that referenced this pull request Oct 22, 2024
* Add workflow collections

These are named and shared key-value storages

* Increase timestamp precision

* Use single return patter on same get function

* Optimize all ops once the names are stable

There is no user update on collection name

* Changelog and formatting

* Add stream_all allowing equivalent queries to HGETALL and HSCAN

* Add stream_match for wildcard prefix queries

* Add pg_trigram and GIN index on the key to allow multi wildcard
stuartc pushed a commit that referenced this pull request Oct 22, 2024
* Add workflow collections

These are named and shared key-value storages

* Increase timestamp precision

* Use single return patter on same get function

* Optimize all ops once the names are stable

There is no user update on collection name

* Changelog and formatting

* Add stream_all allowing equivalent queries to HGETALL and HSCAN

* Add stream_match for wildcard prefix queries

* Add pg_trigram and GIN index on the key to allow multi wildcard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Get Collection by wildcards Get Collection item by key Get all Collection items
3 participants