Stream contains all rows in `.xlsx` sheet instead of only data rows. #58

craigastill · 2023-05-23T12:41:10Z

I'm playing around with: tap-spreadsheets-anywhere locally with an externally generated .xlsx file.

Using skip_initial to get to skip some human readable text and start on the table header.
Using field_names to list each of the expected header titles.
File contains ~100 lines of table data.
There is a Totals row at the bottom of the table data.
A few blank lines, then a human readable notes footer and then blank lines until the end of the sheet at row 1000.
meltano invoke tap-spreadsheets-anywhere has the line: ... INFO Wrote 995 records for stream "<table_name>" .....

Expected to only see my ~100 rows of data in the stream.

Doing a: meltano run tap-spreadsheets-anywhere target-postgres results in: 995 rows written to the table instead ~100.

Mentioned on Meltano Slack.

Also noticed that field_names has not been implemented into the excel_handler.py file. From the description, is the expectation that if it was added, it would stop at blank rows? Or just be an alternative to skip_initial=<integer_to_table_field_names_row>?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream contains all rows in `.xlsx` sheet instead of only data rows. #58

Stream contains all rows in `.xlsx` sheet instead of only data rows. #58

craigastill commented May 23, 2023

Stream contains all rows in .xlsx sheet instead of only data rows. #58

Stream contains all rows in .xlsx sheet instead of only data rows. #58

Comments

craigastill commented May 23, 2023

Stream contains all rows in `.xlsx` sheet instead of only data rows. #58

Stream contains all rows in `.xlsx` sheet instead of only data rows. #58