-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kedro-datasets: ExternalTableDataset for Databricks #817
Comments
Hey @astrojuanlu and the Kedro team, |
Hi @MinuraPunchihewa , if you want to open a pull request please go ahead! |
Thanks, @astrojuanlu. Will do. |
Hi @MinuraPunchihewa, I'm a community user who is on a very similar situation to yours, thanks for tackling this External Tables implementation. Great initiative! I had some things in mind that I thing would be good to consider if we are to refactor this dataset. I'll drop them here so that you and the kedro maintainers can discuss what's the best way to tackle it. I'm happy to contribute as well.
If you think these points and discussion are meningful I can keep posting ideas that come to my mind and contribute to them. We can also keep the conversation about the details in Kedro Slack to be more agile, you can find me here |
On this point, I would say this is not ready. We did some research recently looking at UnityCatalog (from databricks) and Polaris (from Snowflake), they are far from mature and we cannot expect interoperability. |
Hey @MigQ2, |
Description
I use Kedro on Databricks heavily, but in most of my projects the datasets in use are External tables (for a myriad of reasons) and I am currently unable to perform my IO operations easily.
Context
As I have said above, my datasets are maintained as External tables and I am certain there are many others users (and organizations) out there that follow a similar pattern, at least for some of their datasets.
Possible Implementation
It should be possible to implement at
ExternalTableDataset
for Kedro to allow the said IO operations.Possible Alternatives
I have considered converting all of my datasets on Databricks to Managed tables, but at the moment, this does not make sense for me from an architectural point of view.
The text was updated successfully, but these errors were encountered: