Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow non-string keys #5

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

allow non-string keys #5

wants to merge 4 commits into from

Conversation

ldanilek
Copy link
Collaborator

enable ShardedCounter keys to be any type, not just strings. The type can be enforced in Typescript.

Added to readme.

Added example and test of nested counters, as requested in https://discord.com/channels/1019350475847499849/1298765828862247013/1298765828862247013


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ldanilek ldanilek requested a review from ianmacartney October 29, 2024 14:44
Copy link
Contributor

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns here

  • It takes away the dynamic option (e.g. passing in an unknown key to get default behavior)
  • if it isn't a string you can't specify a number of shards
  • you can't fetch all the keys for a prefix / subset, so it's not any more powerful than using JSON.stringify(key) as the key

I'd rather make a dedicated NestedCounter that has some prefix and stores under the hood as an array of strings, able to fetch all data for a prefix. Then you could even call something like

const collection = new NestedCounter<Id<"teams">>(components.counter, { shards: 10 });
const teamCounter = collection.nest(team._id, { shards: { members: 5, events: 25 } });
teamCounter.add(ctx, "events");
teamCounter.add(ctx, "members", 1);
const memberCount = teamCounter.for("members");
memberCount.inc(ctx); // without key it increments the current prefix
const userCounter = memberCount.nest(user._id); // default shards
userCounter.add(ctx, "logins", 1);
teamCounter.get(ctx); // returns all user-prefixed values: 
// { members: 6, events: 10 }
teamCounter.get(ctx, "members"); // 6
userCounter.get(ctx); //

Now this is just a sketch. In particular there's some odd behavior:

  • when you fetch a prefix of a team, it probably shouldn't return every nested user's stats
  • can you both increment key K and have sub-metrics on K.foo ?
  • "nest" vs. "for" vs. using firestore language like "collections" and "docs"

src/client/index.ts Outdated Show resolved Hide resolved
async add<Name extends string = ShardsKey>(
async add<Name extends ShardsKey>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means (I believe) that you can no longer pass in a custom string if you've defined shards in your config

Copy link
Collaborator Author

@ldanilek ldanilek Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, but i think that's actually good.

if you explicitly give the generic, you get the same behavior as before

const counter = new ShardedCounter<string>(components.shardedCounter, {
  shards: { users: 10 },
});

await counter.inc(ctx, "friends"); // this works

and if you have a fixed set of keys, it protects against typos

const counter = new ShardedCounter(components.shardedCounter, {
  shards: { users: 10, friends: 20 },
});

await counter.inc(ctx, "frends"); // errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good call. In fact what we probably want is for shards to either be keyed or fixed, so { shards: 10 } or { shards: { users: 10 } } vs. also having defaultShards

Co-authored-by: Ian Macartney <[email protected]>
@ldanilek
Copy link
Collaborator Author

I have some concerns here

  • It takes away the dynamic option (e.g. passing in an unknown key to get default behavior)
  • if it isn't a string you can't specify a number of shards
  • you can't fetch all the keys for a prefix / subset, so it's not any more powerful than using JSON.stringify(key) as the key

I'd rather make a dedicated NestedCounter that has some prefix and stores under the hood as an array of strings, able to fetch all data for a prefix. Then you could even call something like

const collection = new NestedCounter<Id<"teams">>(components.counter, { shards: 10 });
const teamCounter = collection.nest(team._id, { shards: { members: 5, events: 25 } });
teamCounter.add(ctx, "events");
teamCounter.add(ctx, "members", 1);
const memberCount = teamCounter.for("members");
memberCount.inc(ctx); // without key it increments the current prefix
const userCounter = memberCount.nest(user._id); // default shards
userCounter.add(ctx, "logins", 1);
teamCounter.get(ctx); // returns all user-prefixed values: 
// { members: 6, events: 10 }
teamCounter.get(ctx, "members"); // 6
userCounter.get(ctx); //

Now this is just a sketch. In particular there's some odd behavior:

  • when you fetch a prefix of a team, it probably shouldn't return every nested user's stats
  • can you both increment key K and have sub-metrics on K.foo ?
  • "nest" vs. "for" vs. using firestore language like "collections" and "docs"

interesting interface. can you give an example where you would want to fetch all the keys for a prefix? storing the tree sounds tricky, and in most cases i can think of, the sub-keys would be static or already known by the caller.

Copy link
Contributor

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another API that just came to mind would be an explicit namespace:

const userMetrics = new ShardedCounter(components.counter, {
  namespace: { teamId: team._id }
  shards: { "followers": 15, "followees": 16 },
});

or if you only want to define that one spot (where team doesn't exist yet), do a one-shot

const userCounter = new NamespacedCounter<{teamId: Id<"teams">; userId: Id<"users">}>(components.counter, {
  shards: { "followers": 15, "followees": 16 },
});
userCounter.for({teamId: team._id, userId: user._id }).add("followers", 15);

or just a string namespace

const userCounter = new NestedCounter<Id<"users">>(components.counter, {
  shards: { "followers": 15, "followees": 16 },
});
userCounter.for(user._id).add("followers", 15);

@@ -143,20 +143,27 @@ const counter = new ShardedCounter(components.shardedCounter, {
Or by setting a default that applies to all keys not specified in `shards`:

```ts
const counter = new ShardedCounter(components.shardedCounter, {
const counter = new ShardedCounter<string>(components.shardedCounter, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const counter = new ShardedCounter<string>(components.shardedCounter, {
const counter = new ShardedCounter(components.shardedCounter, {

Comment on lines +12 to 14
const counter = new ShardedCounter<"beans" | "users" | "accomplishments">(components.shardedCounter, {
shards: { beans: 10, users: 3 },
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const counter = new ShardedCounter<"beans" | "users" | "accomplishments">(components.shardedCounter, {
shards: { beans: 10, users: 3 },
});
const counter = new ShardedCounter(components.shardedCounter, {
shards: { beans: 10, users: 3, accomplishments: 15 },
});

@ianmacartney
Copy link
Contributor

It sounded like they wanted, for example, to fetch all stats for an org at once: await statsForOrg.getAll(ctx) was their suggestion. but yeah I agree arbitrary tree is likely overkill. There seems like two useful layers: a prefix/namespace/etc, then a record of key -> number. The second layer is what we have today (and would probably be fine staying a string for the key). The former can today just get prepended via JSON.stringify, but a better API could allow more type safety & idiomatic usage.

@ianmacartney
Copy link
Contributor

another usecase might be to clear all stats for a user (if the user wants their data deleted, e.g.) without having to remember every stat that exists/existed for a user

@ldanilek
Copy link
Collaborator Author

It sounded like they wanted, for example, to fetch all stats for an org at once: await statsForOrg.getAll(ctx) was their suggestion. but yeah I agree arbitrary tree is likely overkill. There seems like two useful layers: a prefix/namespace/etc, then a record of key -> number. The second layer is what we have today (and would probably be fine staying a string for the key). The former can today just get prepended via JSON.stringify, but a better API could allow more type safety & idiomatic usage.

a nice way to implement await statsForOrg.getAll(ctx) would be to enumerate the keys of shards, client-side.

the user-deletion use-case does sound like a reasonable reason to save the tree in the component.

i don't see why having string keys with JSON.stringify is better than having arbitrary convex values as keys.

@ianmacartney
Copy link
Contributor

i don't see why having string keys with JSON.stringify is better than having arbitrary convex values as keys.

I agree. I'm just thinking through the solution space.

  1. The default would be to do nothing and have them build on top of it, so I was just pointing out that it's already possible.
  2. Changing the key to be an arbitrary value allows setting & getting specific nested keys, but doesn't afford listing keys or specifying per-key shards (I don't see a way to specify { shards: { followers: 15, followees: 10, userId: ..? } } - I think you'd only have a global shard count.
  3. Specifying a namespace separately from a key/value would allow listing keys & per-key shards, but requires an API that conditionally enforces specifying the namespace, if one is specified.
  4. Having a full tree format would allow flexible nesting (stored as a flat path / array of values), but has more design uncertainty around the ambiguity between concrete metrics and nested metrics, which can have very different cardinalities so don't make sense to have the same API to get all keys.

(3) seems the best to me - separate out the high & low cardinality layers by having the namespace be high cardinality (by default). then instead of relying on default shards like:

const counter = new ShardedCounter<Id<"users">>(components.counter, { defaultShards: 10 });
counter.add(ctx, userId);

Which only allows one counter per user, we would push to have explicit keys like
const counter = new ShardedCounter<Id<"users">>(components.counter, { shards: { followers: 10 } });
counter.for(userId).add(ctx, "followers");
// or
counter.add(ctx, userId, "followers"); // signature overload if there's a namespace?


This would change the behavior of `for` - or we'd need something new

@amp127
Copy link

amp127 commented Oct 29, 2024

It sounded like they wanted, for example, to fetch all stats for an org at once: await statsForOrg.getAll(ctx) was their suggestion. but yeah I agree arbitrary tree is likely overkill. There seems like two useful layers: a prefix/namespace/etc, then a record of key -> number. The second layer is what we have today (and would probably be fine staying a string for the key). The former can today just get prepended via JSON.stringify, but a better API could allow more type safety & idiomatic usage.

Thanks for all the thought on this!

To clarify our use case is that we are tracking stats for many objects, teams, users, agents, and each one has stats. Its very little importance to get a list of all stats for a team.

We would have very busy stats like teams which need to be sharded, and ones for users that don't get updated so often.

statsForOrg.getAll(ctx) was my idea of returning a list of all stats for that team/object rather then having to ask for each one. It would already likely be a summary of most of the child objects of that team.

I like to think of aiTown as being similar use case but if you had thousands of users/agents. If each user, agent or world had stats. users and agents could track requests to move north,west,east,south(crossing a bridge etc), chat messages, while world has number of active members, currently moving members, currently moving agents. total chat messages, etc, world stats needs to be sharded.

@amp127
Copy link

amp127 commented Oct 29, 2024

I reviewed the code some more and it seems like its what i was going for.
Not exactly sure how i could get these edits of the component into my deployment to test.

(3) seems the best to me - separate out the high & low cardinality layers by having the namespace be high cardinality (by default). then instead of relying on default shards like:

const counter = new ShardedCounter<Id<"users">>(components.counter, { defaultShards: 10 });
counter.add(ctx, userId);

Which only allows one counter per user, we would push to have explicit keys like const counter = new ShardedCounter<Id<"users">>(components.counter, { shards: { followers: 10 } }); counter.for(userId).add(ctx, "followers"); // or counter.add(ctx, userId, "followers"); // signature overload if there's a namespace?

I think having the ability to individually shard each nested value is a bit overkill.

@ianmacartney
Copy link
Contributor

Sorry for the slowness here - this got sidelined by other work but I think we have a few paths forward to consider.

A couple new ways I'm thinking about it:

  1. We should re-use the nomenclature & ergonomics of namspaces here that we added to aggregates, as much as possible. E.g. you make a namespaced counter that takes a namespace as a required argument.
  2. I'm realizing that beyond just having N metrics each with S shards for each user U (making NSU documents), it often makes sense to just have one sharded document per user (S * U). For instance, each mutation operating on a given user might touch a bunch of stats. If they can all just modify one sharded document, then that's much cheaper than each touching a separate shard.
  3. We may even benefit from journaling which shard is used, so multiple changes to the same stat within a mutation touches the same shard, e.g. if it's called from a deeply nested function.

That being said, the downside of (2) is that if you end up wanting to support concurrent changes to stats for a single user (e.g. user operations modify other users' stats), then it's still valuable for each user stat to be sharded separately.

So I'm wondering which of these need to exist:

A: A namespaced set of metrics, that all share shards.
B: A namespace added to each metric, that each has its own shards.
A+B: Some metrics for a user/team/namespace are sharded individually, others are coalesced into a single sharded document for efficiency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants