Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate column: Pandas exception in combiner for min/max of dates/timestamps #1396

Open
nj1973 opened this issue Jan 14, 2025 · 0 comments
Open
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Jan 14, 2025

If a table contains dates/timestamps with a wide gap from the present day then min/max validations throw an exception:

    raise ValueError("Trying to coerce float values to integers")
ValueError: Trying to coerce float values to integers

Test tables
Oracle:

CREATE TABLE dvt_test.dvt_tricky_dates (
  id            NUMBER(5) NOT NULL PRIMARY KEY
, col_dt_low    DATE
, col_dt_epoch  DATE
, col_dt_high   DATE
, col_ts_low    TIMESTAMP(0)
, col_ts_epoch  TIMESTAMP(0)
, col_ts_high   TIMESTAMP(0));
INSERT INTO dvt_test.dvt_tricky_dates VALUES
(1,DATE'1000-01-01',DATE'1970-01-01',DATE'9999-12-31'
,TIMESTAMP'1000-01-01 00:00:00',TIMESTAMP'1970-01-01 00:00:00',TIMESTAMP'9999-12-31 23:59:59');
COMMIT;

PostgreSQL:

CREATE TABLE dvt_test.dvt_tricky_dates (
  id            integer NOT NULL PRIMARY KEY
, col_dt_low    date
, col_dt_epoch  date
, col_dt_high   date
, col_ts_low    timestamp(0)
, col_ts_epoch  timestamp(0)
, col_ts_high   timestamp(0));
INSERT INTO dvt_test.dvt_tricky_dates VALUES
(1,DATE'1000-01-01',DATE'1970-01-01',DATE'9999-12-31'
,TIMESTAMP'1000-01-01 00:00:00',TIMESTAMP'1970-01-01 00:00:00',TIMESTAMP'9999-12-31 23:59:59');

Test commands

col_dt_low:

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_dt_low" --max="col_dt_low" 
Traceback (most recent call last):
  File "/professional-services-data-validator/.venv/bin/data-validation", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/professional-services-data-validator/data_validation/__main__.py", line 694, in main
    validate(args)
  File "/professional-services-data-validator/data_validation/__main__.py", line 670, in validate
    run(args)
  File "/professional-services-data-validator/data_validation/__main__.py", line 636, in run
    run_validations(args, config_managers)
  File "/professional-services-data-validator/data_validation/__main__.py", line 575, in run_validations
    run_validation(config_manager, dry_run=args.dry_run, verbose=args.verbose)
  File "/professional-services-data-validator/data_validation/__main__.py", line 564, in run_validation
    validator.execute()
  File "/professional-services-data-validator/data_validation/data_validation.py", line 118, in execute
    result_df = self._execute_validation(
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/professional-services-data-validator/data_validation/data_validation.py", line 379, in _execute_validation
    raise e
  File "/professional-services-data-validator/data_validation/data_validation.py", line 360, in _execute_validation
    result_df = util.timed_call(
                ^^^^^^^^^^^^^^^^
  File "/professional-services-data-validator/data_validation/util.py", line 21, in timed_call
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/professional-services-data-validator/data_validation/combiner.py", line 77, in generate_report
    differences_df = client.execute(differences_pivot)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
    subarr = maybe_cast_to_integer_array(arr, dtype)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/professional-services-data-validator/.venv/lib/python3.11/site-packages/pandas/core/dtypes/cast.py", line 1688, in maybe_cast_to_integer_array
    raise ValueError("Trying to coerce float values to integers")
ValueError: Trying to coerce float values to integers 

col_dt_epoch (near present time):

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_dt_epoch" --max="col_dt_epoch" 
╒═══════════════════╤═══════════════════╤═══════════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name   │ validation_type   │ source_table_name         │ source_column_name   │ source_agg_value   │ target_agg_value   │   pct_difference │ validation_status   │ run_id                               │
╞═══════════════════╪═══════════════════╪═══════════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ max__col_dt_epoch │ Column            │ dvt_test.dvt_tricky_dates │ col_dt_epoch         │ 1970-01-01         │ 1970-01-01         │                0 │ success             │ 8cd6a97a-bb4b-4794-818c-4a4e50f6c0da │
├───────────────────┼───────────────────┼───────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count             │ Column            │ dvt_test.dvt_tricky_dates │                      │ 1                  │ 1                  │                0 │ success             │ 8cd6a97a-bb4b-4794-818c-4a4e50f6c0da │
├───────────────────┼───────────────────┼───────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ min__col_dt_epoch │ Column            │ dvt_test.dvt_tricky_dates │ col_dt_epoch         │ 1970-01-01         │ 1970-01-01         │                0 │ success             │ 8cd6a97a-bb4b-4794-818c-4a4e50f6c0da │
╘═══════════════════╧═══════════════════╧═══════════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛

col_dt_high:

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_dt_high" --max="col_dt_high" 
...
ValueError: Trying to coerce float values to integers

col_ts_low:

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_ts_low" --max="col_ts_low" 
...
ValueError: Trying to coerce float values to integers

col_ts_epoch:

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_ts_epoch" --max="col_ts_epoch" 
...
╒═══════════════════╤═══════════════════╤═══════════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name   │ validation_type   │ source_table_name         │ source_column_name   │ source_agg_value   │ target_agg_value   │   pct_difference │ validation_status   │ run_id                               │
╞═══════════════════╪═══════════════════╪═══════════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ count             │ Column            │ dvt_test.dvt_tricky_dates │                      │ 1                  │ 1                  │                0 │ success             │ 04aa937c-63bc-4db0-815c-39b590cbd1f6 │
├───────────────────┼───────────────────┼───────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ max__col_ts_epoch │ Column            │ dvt_test.dvt_tricky_dates │ col_ts_epoch         │ 1970-01-01         │ 1970-01-01         │                0 │ success             │ 04aa937c-63bc-4db0-815c-39b590cbd1f6 │
├───────────────────┼───────────────────┼───────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ min__col_ts_epoch │ Column            │ dvt_test.dvt_tricky_dates │ col_ts_epoch         │ 1970-01-01         │ 1970-01-01         │                0 │ success             │ 04aa937c-63bc-4db0-815c-39b590cbd1f6 │
╘═══════════════════╧═══════════════════╧═══════════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛

col_ts_high:

data-validation -v validate column -sc ora_local -tc pg_local  --tables-list=dvt_test.dvt_tricky_dates \
 --min="col_ts_high" --max="col_ts_high" 
...
ValueError: Trying to coerce float values to integers
@nj1973 nj1973 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

1 participant