Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support in Commons CSV for tracking byte positions during parsing. #11

Merged

Conversation

DarrenJAN
Copy link

Summary of Modifications

  1. Test Data Files: Added new test data files, and updated pom.xml to exclude these files from RAT checks, avoiding unapproved license checks.
  2. CSVParser class:
    Constructor Enhancements
    a. Added support for an optional parameter -- String encoding--, which specifies the encoding to use for the reader.
  3. CSVRecord class
  • private long characterByte: start byte position of this record
    Add new Constructor: support track byte positions in record class
  1. ExtendedBufferedReader Class:
  • private long bytesRead: Tracks the number of bytes read so far.
  • private long bytesReadMark: Stores the marked byte position.
  • CharsetEncoder encoder: Encoder used to calculate byte size of characters.
  • getCharBytes(int current): This function calculates character bytes based on UTF-8 encoding. Note: it only supports UTF-8 due to the encoding algorithm used. Full encoding can be supported and we just need more effort on this.
  • reset() and mark() Methods: Enhanced to prevent consuming characters and bytes unintentionally.

Test result:
mvn
image
image
Pass unit tests and other restrictions

pom.xml Show resolved Hide resolved
pom.xml Outdated

<!--

Modifications copyright © 2022, 2024 MarkLogic Corporation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Matt modified this file as well in 2017

@DarrenJAN
Copy link
Author

  1. Needs to replace the jar and regression test before merge
  2. Remember to squash and merge

@DarrenJAN
Copy link
Author

There is a functional bug in the mlcp that needs to be addressed

@DarrenJAN DarrenJAN merged commit 281bd89 into marklogic:1.12.1-marklogic-release Dec 2, 2024
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants