Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle sparse matrix reading for big-endian byte order #163

Merged
merged 4 commits into from
Feb 21, 2025
Merged

Conversation

wcxve
Copy link
Owner

@wcxve wcxve commented Feb 21, 2025

Summary by Sourcery

This pull request addresses an issue with reading sparse matrices in big-endian byte order. It introduces a utility function to convert arrays to native byte order and applies it when reading response matrices. Additionally, it includes new tests to ensure correct handling of big-endian files and different endianness.

Bug Fixes:

  • Fixes an issue where sparse matrices were not being read correctly when using big-endian byte order by converting the matrix to native byte order before creating the sparse matrix.

Enhancements:

  • Adds a utility function to convert arrays to native byte order.

Tests:

  • Adds tests to ensure that the response matrix can be converted to a BCSR matrix in JAX for big-endian files.
  • Adds tests to ensure that ResponseData objects are created correctly for different endianness.

Copy link

sourcery-ai bot commented Feb 21, 2025

Reviewer's Guide by Sourcery

This pull request fixes an issue where sparse matrices with big-endian byte order were not being read correctly. The code now converts the byte order to be compatible with scipy.sparse. Additionally, tests were added to verify that response files with big-endian byte order are loaded correctly and that ResponseData handles different endianness.

Sequence diagram for reading response data with byte order conversion

sequenceDiagram
    participant OGIPReader
    participant ResponseData
    participant to_native_byteorder
    participant coo_array

    OGIPReader->>OGIPReader: _read_response(file, response_id)
    OGIPReader->>to_native_byteorder: reduced_matrix = to_native_byteorder(reduced_matrix)
    to_native_byteorder-->>OGIPReader: returns reduced_matrix
    OGIPReader->>coo_array: sparse_matrix = coo_array(reduced_matrix, ...)
    coo_array-->>OGIPReader: returns sparse_matrix
    OGIPReader-->>OGIPReader: returns response_data
Loading

Updated class diagram for ResponseData

classDiagram
    class ResponseData {
        photon_egrid: np.ndarray
        channel_emin: np.ndarray
        channel_emax: np.ndarray
        response_matrix: scipy.sparse.coo_array
        channel: np.ndarray
        __init__(
            photon_egrid: Sequence[float],
            channel_emin: Sequence[float],
            channel_emax: Sequence[float],
            response_matrix: NDArray | sparray,
            channel: Sequence[str]
        )
    }
    note for ResponseData "The constructor now converts the byte order of response_matrix to native byte order if it's not already in native byte order."
Loading

Class diagram for to_native_byteorder function

classDiagram
    class to_native_byteorder {
        +NDArray to_native_byteorder(arr: NDArray)
    }
    note for to_native_byteorder "Converts a NumPy array to native byte order."
Loading

File-Level Changes

Change Details Files
Addressed an issue where sparse matrices with big-endian byte order were not being read correctly by converting the byte order to be compatible with scipy.sparse.
  • Added a utility function to_native_byteorder to convert arrays to native byte order.
  • Applied to_native_byteorder to the response matrix in ResponseData to ensure compatibility with scipy.sparse.
  • Applied to_native_byteorder when reading the response matrix in _read_response in ogip.py.
src/elisa/util/misc.py
src/elisa/data/base.py
src/elisa/data/ogip.py
Added tests to verify that response files with big-endian byte order are loaded correctly and that ResponseData handles different endianness.
  • Added a parameterized test test_load_response to load and verify response files with different endianness.
  • Added a test test_response to verify that ResponseData handles different endianness.
tests/test_data.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wcxve - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a helper function to handle byte order conversion to avoid repetition.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 632 to 636
# if the byte order is not native,
# then convert it to be compatible with scipy.sparse
if reduced_matrix.dtype.byteorder != '=':
order = reduced_matrix.dtype.newbyteorder()
reduced_matrix = reduced_matrix.astype(order)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider centralizing duplicated byte order conversion logic.

The same pattern for converting non‐native byte order appears in both this file and in 'base.py'. Extracting this logic into a shared utility function could improve maintainability and reduce code duplication.

Comment on lines 145 to 146
ResponseData(photon_egrid, channel_emin, channel_emax, matrix1, channel)
ResponseData(photon_egrid, channel_emin, channel_emax, matrix2, channel)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): These tests are missing assertions.

Simply creating the ResponseData object doesn't verify its correctness. Add assertions to check that the data within the object is correctly handled for both little-endian and big-endian input matrices.

@wcxve
Copy link
Owner Author

wcxve commented Feb 21, 2025

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wcxve - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a short explanation in to_native_byteorder about why this conversion is necessary.
  • The new tests are good, but it's not clear why the existing data files have different endianness.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 2 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -1138,8 +1139,11 @@ def __init__(
photon_egrid = np.array(photon_egrid, dtype=np.float64, order='C')
channel_emin = np.array(channel_emin, dtype=np.float64, order='C')
channel_emax = np.array(channel_emax, dtype=np.float64, order='C')
response_matrix = coo_array(response_matrix)
response_matrix.eliminate_zeros()
if not isinstance(response_matrix, sparray):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Revisit the conditional conversion for response_matrix.

Currently, the conversion to native byte order is skipped if response_matrix is already an instance of sparray. Ensure that sparse matrices are always natively ordered or consider if they might also need conversion to avoid potential incompatibility issues with scipy.sparse.

Comment on lines 136 to 137
# test if the response matrix can be converted to a BCSR matrix in JAX
BCSR.from_scipy_sparse(rsp.matrix)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding assertions on the content of the BCSR matrix.

Converting to BCSR is good, but it doesn't confirm the content is correct. Add assertions to check specific values within the BCSR.from_scipy_sparse(rsp.matrix) result to ensure the conversion and data are as expected for different endianness.

Suggested change
# test if the response matrix can be converted to a BCSR matrix in JAX
BCSR.from_scipy_sparse(rsp.matrix)
# test if the response matrix can be converted to a BCSR matrix in JAX
bcsr_matrix = BCSR.from_scipy_sparse(rsp.matrix)
# Verify that the converted matrix has the same shape as the original
assert bcsr_matrix.shape == rsp.matrix.shape
# Compare dense representations to confirm correct conversion of data
original_dense = rsp.matrix.todense()
bcsr_dense = bcsr_matrix.todense()
assert np.allclose(bcsr_dense, original_dense)

Comment on lines +147 to +148
mat1 = np.eye(100).astype('<f4')
mat2 = np.eye(100).astype('>f4')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider testing non-identity matrices.

Using identity matrices might not fully exercise the byte-swapping logic. Consider adding tests with more complex matrices to ensure all data is handled correctly.

@wcxve wcxve merged commit 472eee9 into main Feb 21, 2025
14 checks passed
@wcxve wcxve deleted the fix-sparse branch February 21, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant