-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From redshift DB, read_sql_query
distorts values, but unload
does not
#2216
Comments
Thanks @lampretl looking into it |
If I run that jupyter cell several times, over different days, I always get the slightly negative result with According to DBeaver, the min and max values in my table are |
Could you share the output of |
Of course, if I run
|
@kukushking Have you been able to reproduce the bug? |
@lampretl Unfortunately no I wasn't able to reproduce this. Yes please, if you could share here a test case that would be much appreciated! |
@kukushking I was not able to reproduce the wrong output by uploading a dataframe, but I have found the source of the problem: internal casting of redshift's
So my questions are:
|
Hi @lampretl yes looks like that's the issue. I still wasn't able to reproduce with the same numpy, pandas, and arrow versions - all returned values are decimals wrapped in pandas object, but there is something I think you can do:
You can force read_sql_query to return a decimal with expected precision/scale. Please give it a try. If that matches your db schema, you should have no issues. |
Describe the bug
In our private redshift database on AWS, I have a table with a column
probability
of typenumeric(38, 20)
, its values are between 0 and 1. In AWS SageMaker jupyter notebook, I query/download the content of that table. The values obtained viaread_sql_query
are also negative, but the ones obtained viaunload
are all positive.How to Reproduce
When executing the following code
the output is unexpectedly:
Expected behavior
Outputs
df1.p.min()
anddf2.p.min()
should be equal. And certainly, both must be between 0 and 1.Your project
No response
Screenshots
OS
Linux (AWS)
Python version
3.8.16
AWS SDK for pandas version
3.0.0
Additional context
No response
The text was updated successfully, but these errors were encountered: