Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37245] Prevent masking null values in BinaryRowData #26100

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mxm
Copy link
Contributor

@mxm mxm commented Feb 3, 2025

RowData#createFieldGetter is the go-to method for creating field getters for a given field type and position. The method FieldGetter#getFieldOrNull suggests that null is returned if the field has been nulled. But that is not always the case.

When using BinaryRowData with a non-null field, which has been set to null, a call to FieldGetter#getFieldOrNull will return a non-null value, interpreting whatever bytes are backing the field as an actual value instead of null.

Example:

  public static void main(String[] args) {
    IntType nullableIntType = new IntType(true);
    IntType nonNullableIntType = new IntType(false);
    RowDataSerializer rowDataSerializer = new RowDataSerializer(
            nullableIntType, nonNullableIntType
    );
    BinaryRowData binaryRow = rowDataSerializer.toBinaryRow(GenericRowData.of(null, null));
    RowData.FieldGetter fieldGetter1 = RowData.createFieldGetter(nullableIntType, 0);
    RowData.FieldGetter fieldGetter2 = RowData.createFieldGetter(nonNullableIntType, 1);
    System.out.println(fieldGetter1.getFieldOrNull(binaryRow));
    System.out.println(fieldGetter2.getFieldOrNull(binaryRow));
  }

Output is:

null
0

The expected output would be that the second non-null field also returns null, or raises a NullPointerException directly. That's not the case because RowData#createFieldGetter only checks for null values (via a call to Rowdata#isNullAt(pos)) when the type is nullable (see

).

It seems fair to always check for null fields, instead of deferring this easy to forget check to the caller.

We needed to work around this bug in apache/iceberg#12049.

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 3, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants