Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dataobj): cardinality estimation #16233

Merged
merged 11 commits into from
Feb 13, 2025
Merged

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Feb 12, 2025

  • Introduces value cardinality estimation, initially for metadata fields
  • Backed by hyperloglog, initially using sparse representations for space conservation
  • Only writes a uint64 cardinality count estimation on flush
  • Refactors statistics writing to centralize logic, using a new ColumnStatsBuilder

@owen-d owen-d requested a review from a team as a code owner February 12, 2025 21:05
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few tiny comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used internally by the dataset.ColumnBuilder, so we don't need to export anything here as part of the API of dataset :)

(I like limiting APIs whenever possible because I find it makes it easier to navigate code completion and godoc pages)

Comment on lines 168 to 171
if success {
// only append stats if append was successful
cb.statsBuilder.Append(value)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to do this inside ColumnBuilder.Append instead? That already has a check to see if the append was successful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what's happening. It monitors the success of the page builder to see if it should append stats, primarily because the page builder returns false when full. In turn, this check ensures we only add the stats once, rather than twice when a new page is cut.

It makes sense to add this as a code comment -- I'll update it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not following. I was asking if this would be simpler:

// Append adds a new value into cb with the given zero-indexed row number. If
// the row number is higher than the current number of rows in cb, null values
// are added up to the new row.
//
// Append returns an error if the row number is out-of-order.
func (cb *ColumnBuilder) Append(row int, value Value) error {
	if row < cb.rows {
		return fmt.Errorf("row %d is older than current row %d", row, cb.rows)
	}

	// We give two attempts to append the data to the buffer; if the buffer is
	// full, we cut a page and then append to the newly reset buffer.
	//
	// The second iteration should never fail, as the buffer will always be empty
	// then.
	for range 2 {
		if cb.append(row, value) {
			cb.rows = row + 1
                        cb.statsBuilder.Append(value) 
			return nil
		}

		cb.flushPage()
	}

	panic("ColumnBuilder.Append: failed to append value to fresh buffer")
}

Since there's already a condition for success there, it would avoid a second check for success on the append

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I see 👍

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@owen-d owen-d enabled auto-merge (squash) February 13, 2025 17:06
@owen-d owen-d merged commit 66889ec into grafana:main Feb 13, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants