-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dataobj): cardinality estimation #16233
Conversation
feat: add cardinality count estimation to columns via hyperloglog
stores cardinality for metadata fields
…avor of grafana#16097 refactor This reverts commit c8c584f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few tiny comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used internally by the dataset.ColumnBuilder, so we don't need to export anything here as part of the API of dataset :)
(I like limiting APIs whenever possible because I find it makes it easier to navigate code completion and godoc pages)
if success { | ||
// only append stats if append was successful | ||
cb.statsBuilder.Append(value) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to do this inside ColumnBuilder.Append
instead? That already has a check to see if the append was successful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what's happening. It monitors the success of the page builder to see if it should append stats, primarily because the page builder returns false when full. In turn, this check ensures we only add the stats once, rather than twice when a new page is cut.
It makes sense to add this as a code comment -- I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still not following. I was asking if this would be simpler:
// Append adds a new value into cb with the given zero-indexed row number. If
// the row number is higher than the current number of rows in cb, null values
// are added up to the new row.
//
// Append returns an error if the row number is out-of-order.
func (cb *ColumnBuilder) Append(row int, value Value) error {
if row < cb.rows {
return fmt.Errorf("row %d is older than current row %d", row, cb.rows)
}
// We give two attempts to append the data to the buffer; if the buffer is
// full, we cut a page and then append to the newly reset buffer.
//
// The second iteration should never fail, as the buffer will always be empty
// then.
for range 2 {
if cb.append(row, value) {
cb.rows = row + 1
cb.statsBuilder.Append(value)
return nil
}
cb.flushPage()
}
panic("ColumnBuilder.Append: failed to append value to fresh buffer")
}
Since there's already a condition for success there, it would avoid a second check for success on the append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I see 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
ColumnStatsBuilder