-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Delta table doctor that analyzes and health and wellness of a Delta table #7
Comments
Hello @MrPowers : As per my understanding, we can create a code with below expectation:
|
@puneetsharma04 - thanks for the suggestion, but this repo does not depend on Spark, so we'll need a solution that does not use PySpark. We'll need to use these APIs. |
@MrPowers : So you mean, we have to use delta-rs Python API to access the Delta Lake table and perform the necessary checks? then may be the code like below can work.
|
@puneetsharma04 - this looks great. Really good work. Can you submit a PR? |
One note on the reference implementation above: would it be possible to return the information in a little more structured format? Even just a list of warnings would be better than a single string IMO. For example, I might be interested in periodically running
And if we made classes for these warnings
to filter out warnings to we may want to explicitly ignore as the list of health checks grows over time. |
@jeremyjordan : Thanks for the suggestion.
|
@jeremyjordan - yes, I like the idea of returning results in a structured format. That's pretty much always my preference. Thanks for calling that out. |
@puneetsharma04 - looks like you're using this path: You should be using this path: Let me know if that fixes your issue! |
@MrPowers : You are right. However i haven't made any changes to code. Should i provide the full path and change the code or any other work around for it ? It like it creates a test folder (/Users/puneet_sharma1/Documents/GitHub/levi/tests/tests/reader_tests/generated/basic_append/delta/) on its own on this path. I am not sure why this kind of behaviour is there. |
A
levi.delta_doctor(delta_table)
command could be a nice way for users to help identify issues in their Delta table that could cause slow query performance.There are several known problems that can cause poor performance of Delta tables:
The
levi.delta_doctor(delta_table)
could return a string with the following warnings:We should make it really easy for users to see if there are any obvious problems in their Delta table. We will ideally give them really easy solutions to fix these problems as well!
The text was updated successfully, but these errors were encountered: