Improve initial metadata read using bson instead of unmarshal #88

eparker-tulip · 2024-10-30T18:42:36Z

Before entries can be masked out, or filtered by shard, an initial set of oplog entry metadata needs to be processed. This is currently done via the rawOplogEntry type which is populated using bson.Unmarshal(), but this is much slower compared to accessing the necessary properties directly using bson methods. Additionally, the namespace entry can be accessed before anything else, so that filtering can happen as far upstream as possible.

Benchmarking shows that small entries are 2x as fast, while the largest entries can be 100x faster.

byronwasti

Nice performance find!

lib/oplog/tail.go

sannivasreddy

Nice! I left a couple of comments below. It's strange that bson.Unmarshal even with raw bson is still worse (time and space complexity wise) than just looking up everything manually.

sannivasreddy · 2024-11-08T19:39:49Z

lib/oplog/tail.go

-	err := bson.Unmarshal(rawData, &result)
-	if err != nil {
-		log.Log.Errorw("Error unmarshalling oplog entry", "error", err)
-		return


In this code, if there is any error in the unmarshal, we just returned here. Now, even if the various lookups or type checks fail, we log a message and continue. Should we just return if any step of this lookup process fails?

I'm also curious about when the previous unmarshal used to error. Did it error if the struct field type didn't match the type in the document? Probably. The change I proposed above deals with that. However, did it error if a matching key in the document for the struct wasn't found? I don't think so, and doing my change would mean different behavior in that case. I think looking into when bson unmarshalling errors and making sure behavior is the same as before (splitting the lookup and type check if necessary) is a good move.

Did it error if the struct field type didn't match the type in the document?

Yes, I tested this. If a field was missing or extra, there was no error. Only if a field existed but with a different type would the original unmarshal error. So this tries to mostly follow that pattern, but does error on namespace which would seem to be a required field. However, maybe none are -- I don't know what namespace would be set to in the case of a transaction. In that case, splitting up the lookup and type conversion and exactly matching the old behavior is probably best -- I had started that way, but wanted to avoid it since it makes for terribly messy code.

Yeah, it does get gnarly. Maybe put the conversion from bson to rawOplogEntry in a helper function? That might help that, and also make it easier to make changes in the future if necessary,

That would work -- I'd also have to pass in the denylist to allow for the early filtering, but that's probably fine

sannivasreddy · 2024-11-08T19:44:58Z

lib/oplog/tail.go

-		log.Log.Errorw("Error unmarshalling oplog entry", "error", err)
-		return
+	if len(result.Namespace) > 0 { // try to filter early if possible
+		db, _ := parseNamespace(result.Namespace)


If we're doing a parse here, since we're not unmarshalling into the rawOplogEntry struct manually anymore, it might be worth it to store the returned database and collection name we get here in that struct, and then skipping the namespace parsing which happens later in parseRawOplogEntry by using the stored ones.

True, but it's more complicated because while it would work for the first one, parseRawOplogEntry is recursive for transactions, so it still needs to be done there anyway. And the parse is a pretty lightweight call, so to me this seems ok for the sake of simplicity.

sannivasreddy · 2024-11-08T19:46:52Z

lib/oplog/tail.go

+	if len(result.Namespace) > 0 { // try to filter early if possible
+		db, _ := parseNamespace(result.Namespace)
+
+		if _, denied := denylist.Load(db); denied {


We check the deny list again in line 420, which is probably not necessary anymore. That check would only do anything if the namespace exists and is parseable so we cover that case here.

The difference is that the second check is post-transaction, the way the original behavior was (though still it only checks the first entry in the transaction). But maybe we don't care about transactions for filtering? This might be good to confirm with @alex-goodisman, who originally added the denylist.

Actually, I think what you currently have is right. Rereading the parse code, it seems if there’s a command operation, the top level entry’s namespace is just admin.$cmd, so we probably do need to check that first sub-entry in the denylist. I’m also realising that we also now check this case (i.e. checking the namespace admin.$cmd) in the new denylist check, while we never had to do that before. Should be fine since admin shouldn’t be in the denylist, but something to keep in mind.

lib/oplog/tail.go

sannivasreddy · 2024-11-08T20:14:40Z

lib/oplog/tail.go

@@ -505,19 +543,14 @@ func (tailer *Tailer) parseRawOplogEntry(entry rawOplogEntry, txIdx *uint) []opl

 		out.Database, out.Collection = parseNamespace(out.Namespace)

+		var errID error


Nit: What about calling this just err? errID sounds like it stores an error ID, rather than it being the error for the ID parsing part.

Yeah, this is a pattern Yotam likes -- to be clear what each error variable is for, rather than just always err. But I'm fine either way, and I agree this could be confusing.

sannivasreddy · 2024-11-08T20:31:43Z

lib/oplog/tail_test.go

@@ -142,7 +142,7 @@ func TestParseRawOplogEntry(t *testing.T) {
 				Operation: "u",
 				Namespace: "foo.Bar",
 				Doc:       mustRaw(t, map[string]interface{}{"new": "data"}),
-				Update:    rawOplogEntryID{ID: "updateid"},
+				Update:    rawBson(t, map[string]interface{}{"_id": "updateid"}),


I see you used rawBson above and mustRaw in the other test cases. Why so? The pattern I notice seems to be that mustRaw is used for the input, and rawBson for the expected result. mustRaw seems weird actually. Other than what rawBson does, it checks if a byte array can be unmarshalled into bson.Raw, but that doesn't seem helpful, because they both have the same underlying type.

Yeah, mustRaw was already there, and I added rawBson for the other tests, so when I had to update this I used the rawBson, but I see now they're basically the same so I replaced all the mustRaw with rawBson to simplify.

Co-authored-by: sannivasreddy <[email protected]>

sannivasreddy

Everything looks good, thanks for the changes! I've left a couple of minor thoughts below.

Other than those, it might be worth it to change the order of the functions in the tail.go file. For instance, getStartTime could be moved up before processEntry, and unmarshalEntryMetadata could be moved down to right before parseRawOplogEntry. I'm also not sure why some of these functions are methods on *Tailer. parseRawOplogEntry does nothing on *Tailer except call itself, and processEntry also does nothing. It might be worth doing some cleanup of this kind.

sannivasreddy · 2024-11-12T17:07:09Z

lib/oplog/tail.go

@@ -366,16 +360,77 @@ func closeCursor(cursor *mongo.Cursor) {
 	}
 }

-// unmarshalEntry unmarshals a single entry from the oplog.
+// unmarshalEntryMetadata processes the top-level data from an entry and returns a rawOplogEntry object.


It might be nice to say a bit more about the performance issue with the usual unmarshalling, explaining why we need to do all these manual lookups.

sannivasreddy · 2024-11-12T17:32:59Z

lib/oplog/tail.go

@@ -540,7 +603,7 @@ func (tailer *Tailer) parseRawOplogEntry(entry rawOplogEntry, txIdx *uint) []opl



Do you think it's worth it to replace the bson unmarshalling in the transaction case as well? We'd probably have to add a new function since we're now looking at a document array. I don't think this is necessary currently, but I just thought I'd mention it.

You're right -- I actually was intending to do this when I split off the function and then forgot after the long weekend

Version bump forgotten in the [last PR for the metadata performance improvement](#88).

initial commit

e60bdd7

eparker-tulip requested review from torywheelwright, aranair and alex-goodisman October 31, 2024 15:16

byronwasti approved these changes Nov 8, 2024

View reviewed changes

lib/oplog/tail.go Outdated Show resolved Hide resolved

sannivasreddy reviewed Nov 8, 2024

View reviewed changes

eparker-tulip and others added 3 commits November 8, 2024 15:29

Update lib/oplog/tail.go

4a0adb1

Co-authored-by: sannivasreddy <[email protected]>

refactor of metadata processing function

4dae2ad

check for special case admin. namespace before filtering

7101ad4

eparker-tulip requested a review from sannivasreddy November 12, 2024 16:45

sannivasreddy approved these changes Nov 12, 2024

View reviewed changes

refactor denylist, use raw processor for transactions as well

47dc4fa

eparker-tulip requested a review from sannivasreddy November 12, 2024 19:36

populate db var for metrics

da48e3a

sannivasreddy approved these changes Nov 12, 2024

View reviewed changes

eparker-tulip merged commit 06e1798 into master Nov 12, 2024
8 checks passed

eparker-tulip deleted the eparker.bson-metadata branch November 12, 2024 21:45

eparker-tulip mentioned this pull request Nov 19, 2024

Version bump #90

Merged

eparker-tulip added a commit that referenced this pull request Nov 19, 2024

Version bump (#90)

e6eaeea

Version bump forgotten in the [last PR for the metadata performance improvement](#88).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve initial metadata read using bson instead of unmarshal #88

Improve initial metadata read using bson instead of unmarshal #88

eparker-tulip commented Oct 30, 2024

byronwasti left a comment

sannivasreddy left a comment

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024 •

edited

Loading

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024 •

edited

Loading

sannivasreddy Nov 9, 2024

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024

sannivasreddy Nov 8, 2024

eparker-tulip Nov 8, 2024

sannivasreddy left a comment

sannivasreddy Nov 12, 2024

sannivasreddy Nov 12, 2024

eparker-tulip Nov 12, 2024

		@@ -505,19 +543,14 @@ func (tailer Tailer) parseRawOplogEntry(entry rawOplogEntry, txIdx uint) []opl

		out.Database, out.Collection = parseNamespace(out.Namespace)

		var errID error

		@@ -540,7 +603,7 @@ func (tailer Tailer) parseRawOplogEntry(entry rawOplogEntry, txIdx uint) []opl

Improve initial metadata read using bson instead of unmarshal #88

Improve initial metadata read using bson instead of unmarshal #88

Conversation

eparker-tulip commented Oct 30, 2024

byronwasti left a comment

Choose a reason for hiding this comment

sannivasreddy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eparker-tulip Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eparker-tulip Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sannivasreddy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eparker-tulip Nov 8, 2024 •

edited

Loading

eparker-tulip Nov 8, 2024 •

edited

Loading