Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream contains all rows in .xlsx sheet instead of only data rows. #58

Open
craigastill opened this issue May 23, 2023 · 0 comments
Open

Comments

@craigastill
Copy link
Contributor

I'm playing around with: tap-spreadsheets-anywhere locally with an externally generated .xlsx file.

  • Using skip_initial to get to skip some human readable text and start on the table header.
  • Using field_names to list each of the expected header titles.
  • File contains ~100 lines of table data.
  • There is a Totals row at the bottom of the table data.
  • A few blank lines, then a human readable notes footer and then blank lines until the end of the sheet at row 1000.
  • meltano invoke tap-spreadsheets-anywhere has the line: ... INFO Wrote 995 records for stream "<table_name>" .....

Expected to only see my ~100 rows of data in the stream.

Doing a: meltano run tap-spreadsheets-anywhere target-postgres results in: 995 rows written to the table instead ~100.

Mentioned on Meltano Slack.


Also noticed that field_names has not been implemented into the excel_handler.py file. From the description, is the expectation that if it was added, it would stop at blank rows? Or just be an alternative to skip_initial=<integer_to_table_field_names_row>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant