Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for parsing binary argument groups. #910

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

tgregg
Copy link
Contributor

@tgregg tgregg commented Jul 10, 2024

Description of changes:

Implements the low-level parsing of binary argument groups according to the following spec:

Argument group encoding for tagged encodings
FlexUInt 0: Delimited
1..N: Size in bytes = N
Empty tagged argument groups must use bitmap 00 (or delimited start/end, but that's wasteful.)

Argument group encoding for tagless encodings
FlexUInt 0: Final 'batch'
1..N: Batch that is N bytes long
Truly empty tagless argument groups should use bitmap 00, but it's legal to use the final batch zero too.

The public methods added to IonCursorBinary are expected to be called by higher-level readers with awareness of the macro signatures applicable at the current stream position. The basic sequence of method calls required to parse tagged and tagless argument groups is illustrated in the added tests.

At the bottom of the tests I've added some TODOs to provide more coverage of edge cases. More thorough coverage will also come from the conformance suite. In the source I also added some TODOs regarding the challenges of skipping values that may contain non-prefixed macro invocations. We'll need to handle this somehow (via consultation with a higher-level reader with knowledge of the signatures), but I've left that out of scope for this PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tgregg tgregg requested a review from popematt July 10, 2024 22:05
Copy link
Contributor

@popematt popematt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but I have a few questions.

Comment on lines +225 to +226
ArgumentGroupMarker[] argumentGroupStack = new ArgumentGroupMarker[CONTAINER_STACK_INITIAL_CAPACITY];
int argumentGroupIndex = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious—we have a _Private_RecyclingStack that looks to do about the same thing as what you are doing with this. Have you observed any performance difference between the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately. I originally used the recycling stack for the containers in IonCursorBinary and observed a fairly significant speedup (~10% as I recall) by using an array directly. I decided to use the same technique here.

src/main/java/com/amazon/ion/impl/IonCursorBinary.java Outdated Show resolved Hide resolved
Comment on lines +3415 to +3417
// Push a dummy delimited container onto the stack, preparing the cursor to seek forward to the delimited end
// marker applicable at the current depth.
pushContainer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your reasoning for not actually treating expression groups like a container?

If you just treated expression groups like another container, it seems like you could probably eliminate some data structures, some code that is similar between containers and expression groups, and some public methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered implementing it that way, but decided not to (for now) for a few reasons:

  1. It complicates calculating of the reader's current depth because the argument groups need to be filtered out. This is pretty minor.
  2. The container stack uses the Marker class, which is reused in many places and has a startIndex, endIndex, and IonTypeID. Argument groups need a startIndex, endIndex, and PrimitiveType. I'd have to merge these classes if using a unified stack, and I didn't feel like bloating Marker with an attribute that is only needed by argument groups (and is never needed in Ion 1.0). This also isn't a huge deal.

Which public methods were you envisioning this would allow us to remove? I'm not sure we could reuse the stepIn/stepOut methods for this, for example, because the encoding for tagged and tagless groups differs and is determined by the signature.

Anyway, I'm happy to revisit this in the future, but if you're up for merging this as-is for now we can start using it and see whether it ends up being cumbersome.

@tgregg tgregg merged commit 4adafc1 into ion-11-encoding Jul 16, 2024
17 checks passed
@tgregg tgregg deleted the read-argument-groups-merge branch July 16, 2024 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants