Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scanning iterator AllRowsScanned #2241

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

xobotyi
Copy link

@xobotyi xobotyi commented Jan 24, 2025

This pull request introduces an iterator that scans rows on the go.

The motivation for this implementation arose from the need to scan rows into a map. The current built-in functions are sub-optimal for this purpose, as they require scanning into a slice first and then iterating again to copy data into the result map.

4ex:

type Data struct {
	ID   int32
	Name string
}

type Result map[int32]Data

func preIter() (Result, error) {
	var rows pgx.Rows

	scannedRows, err := pgx.CollectRows[Data](rows, pgx.RowToStructByName)
	if err != nil {
		return nil, err
	}

	result := make(Result, len(scannedRows))

	for _, row := range scannedRows {
		result[row.ID] = row
	}

	return result, nil
}

func postIter() (Result, error) {
	var rows pgx.Rows
	result := Result{}

	for row, err := range pgx.AllRowsScanned[Data](rows, pgx.RowToStructByName) {
		if err != nil {
			return nil, err
		}

		result[row.ID] = row
	}

	return result, nil
}

This PR is a proposal and currently lacks examples and thorough testing of the function. Further documentation and tests will be provided if the proposal is considered for merging.

There are several points for discussion:

  1. Since iterators were introduced in Go 1.23, this pull request increases the Go version in go.mod to 1.23. This change breaks the convention of supporting "two most recent major releases."
  2. The iterator yields one more time in case of an error to ensure there are no errors in the rows after iteration. While this approach aligns with the overall error-check flow of the solution, as it can be considered an "error during scan," I am open to discussions regarding this approach.

@felix-roehrich
Copy link
Contributor

Yielding one more time will result in runtime panics, if an error occurs like here https://go.dev/play/p/S6VGunr5iZT.

@xobotyi
Copy link
Author

xobotyi commented Jan 24, 2025

@felix-roehrich, you are correct. There should be a return statement instead of break -- I've had a feeling that something is off regarding iteration stop, but, as said, not much tests yet.

We have two options here:

  1. Leave things as-is (with break replaced by return), and require the developer to perform a trailing rows.Err() check if the iteration stops prematurely. This will require explicit highlight within method documentation.
    Personally, I lean towards this option, as in my experience with the internal implementation, we have never stopped iteration prematurely, thus never faced the panic.
  2. Expect the developers to always manually check for errors after iteration.

Previous approach could cause panic in case loop broken by
developer and error occurred on rows.
@xobotyi
Copy link
Author

xobotyi commented Jan 24, 2025

Or, actually, not.. If i dont miss anything -- there is no reason for user of iterator to check rows.Err() in any case.

  • In case everything wen good -- there will be only n yields (n - number of rows in the result).
  • In case iteration was stopped because there are no more rows -- there will be n yields.
  • In case iteration was stopped because there is an error during read -- there will be n+1 yields (n - number of successful row reads).
  • In case iteration was stopped by developer -- there will be n yields, because there is basically no way for rows.Err() return an error as rows.Next() won't be called anyways.

So there was single error of mine putting break instead of return.

@felix-roehrich
Copy link
Contributor

Yes, lgtm now. I just feel the name is not appropriate, especially 'Scanned' being past tense, something like pgx.Iterate/pgx.ScanRows would be more intuitive.

I do not believe that breaking the support promise will or should happen, so imo this will have to wait until 1.24 is released. Anyway, thanks for looking at this and let's see what @ jackc says.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

You didn't mention him BTW, there is space between "at" and nickname🤭

Regarding the name, GO proposes the name All for functions that returns iterator over all entries of collection.
It brings us to pgx.AllRows and the call signature will be for row, err := range pgx.AllRows(rows), but i would expect it to return rows itself, not scanned.
But in my case, since developer passing scanner function and getting already scanned rows i decided to call it with past tense, since whole process can be described as "iteration over all scanned rows"

@felix-roehrich
Copy link
Contributor

The docs also have a lot of examples without All(), e.g. Cities() and not AllCities() or Backward() and not AllBackward(). Thus, I am not so fixated on this. I understand your point about 'Scanned', though.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

True, agree.

@jackc
Copy link
Owner

jackc commented Jan 25, 2025

Regarding the Go version, 1.24 is due any day now so that wouldn't be a problem. It would also would have been possible to use build tags to provide backwards compatibility.

As far as an iterator in general, perhaps it might be sense to support eventually, but I'm not sold on iterators right now.

In this particular implementation, error handling still seems awkward to me. Maybe I'm simply very familiar with the current style of process all the rows and check for errors at the end. But it seems weird to check for error inside of the row handler and then not check after the rows are processed.

Also, I'm not sure how this iterator approach is better than a variant of ForEachRow that used templates and RowToFunc. If anything, I would expect that to be a more ergonomic interface for your original use case.

Ultimately, before integrating iterator support into core pgx, I'd like to see a use case that has significantly better ergonomics than can be done without it.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition). For that matter - there are no examples of usage ForEachRow in situation of handing more that 1-2 fields throughout documentation whatsoever.

Regarding RowToFunc approach - it is in the description. pgx's methods allows to easily scan only into a slice, which is understandable and obvious, but not true for all the cases of userland, sometimes (in my experience almost half the times) we want to get maps and other weird structures as the result.
And to do so we have to indulge additional iteration over scanned rows and copy data one more time, which is unnecessary in case of iterators usage.
It is not only more code to support, but also less performant. I'll push some benchmarks in a moment showcasing more than 30% uplift in performance, in the situation when we need map as a result. I understand that it is only one of cases but it is there.

@jackc
Copy link
Owner

jackc commented Jan 25, 2025

I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition).

@jackc
Copy link
Owner

jackc commented Jan 25, 2025

I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition).

I was referring to making a new variant of ForEachRow that used generics and RowToFunc to solve this.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

Aaah..
Actually i have implemented similar function for our internal lib at the moment pgx gained generic scanners🤭

func ForScannedRows[T any](rows pgx.Rows, scanner pgx.RowToFunc[T], fn ScannedRowIteratorFn[T]) error {
	defer rows.Close()

	for rows.Next() {
		row, err := scanner(rows)
		if err != nil {
			return err
		}

		cont, err := fn(&row)
		if err != nil {
			return err
		}

		if !cont {
			break
		}
	}

	return rows.Err() //nolint:wrapcheck
}

If to repeat functionality from the PR description with that function:

func functorIter() (Result, error) {
	var rows pgx.Rows
	result := Result{}

	err := ForScannedRows(rows, pgx.RowToStructByName, func(row *Data) (bool, error) {
		result[row.ID] = row
		return true, nil
	})
	if err != nil {
		return nil, err
	}

	return result, nil
}

It is "almost" the the same, except the fact that it is not possible to return from iteration handler, as it is different scope. We have to stop iteration and only then return from outer function.

Overall i like that approach less.

@jackc
Copy link
Owner

jackc commented Jan 25, 2025

Your ForScannedRows is the idea of what I was thinking of.

Returning from the outer scope does seem to be the major functional difference.

But the error handling inside the loop just feels so ugly...

I think I'd like something like this to get some real world use outside of pgx before committing to any particular interface. Especially, given that the implementation is a standalone function.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

Yes, the upside of this approach is separation of concerns and ability to share logic.

@xobotyi
Copy link
Author

xobotyi commented Jan 25, 2025

@jackc if it somehow can ease your perception regarding errors handling within loop.
As it figures (I've been searching for same usages of iter.Seq2[T, error]), google itself has same approach for iterators that can return errors: https://cs.opensource.google/search?q=iter.Seq2
Yes, their iterator have exact same signature (it returns next item and error), but still.

Disregarding iterators -- i can prepare PR introducing generic version of foreach, maybe call it ForEachRowScanned?

@jackc
Copy link
Owner

jackc commented Feb 2, 2025

I've reviewed this again, and I'm starting to lean toward we will want iterators in pgx eventually, but I'm not ready to commit to any particular interface yet.

My reluctance is probably due to my lack of experience with the new custom iterator system. I just don't have enough experience to make a good decision. I'd like to defer this until either I have more confidence in my ability to make a good decision or we have additional feedback of this being used successfully. We can always add it later, but once it is in a pgx release, we can never take it out in the current major version. And since the implementation is small and can be easily copied into any application that needs it, it doesn't actually stop anyone from using it immediately.


As it figures (I've been searching for same usages of iter.Seq2[T, error]), google itself has same approach for iterators that can return errors: https://cs.opensource.google/search?q=iter.Seq2
Yes, their iterator have exact same signature (it returns next item and error), but still.

I looked at the first example:

// All returns an iterator. If an error is returned by the iterator, the
// iterator will stop after that iteration.
func (it *AttackPathIterator) All() iter.Seq2[*securitycenterpb.AttackPath, error] {
	return iterator.RangeAdapter(it.Next)
}

This is actually a little scary to me. I'm looking at Go code that is hard to understand. While I am in favor of Go having gained generics and iterators, the naysayers did have a point. Go code can be a lot more opaque than it used to be.


Disregarding iterators -- i can prepare PR introducing generic version of foreach, maybe call it ForEachRowScanned?

Let's hold off on this too. While I'm not ready to merge iterators just yet, they probably are the idiomatic solution going forward. So if we add something now it will probably be more or less deprecated in a year or less.

@xobotyi
Copy link
Author

xobotyi commented Feb 2, 2025

Understood, I feel that.

I guess PR would be better kept open, so others would not make duplicated PRs and join the discussion?

@jackc
Copy link
Owner

jackc commented Feb 3, 2025

Yes, let's leave this open and review in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants