-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving Unity Catalog tables fails #3708
Comments
|
I would recommend at least documenting this somewhere or updating to the new unitycatalogue-client library? :) |
Env ist on Azure with ADL Gen2 and Databricks. |
Can you try manually providing credentials into from daft.daft import S3Config, IOConfig
s3_config_from_env = S3Config.from_env()
io_config = IOConfig(s3=s3_config_from_env)
df = daft.read_deltalake(unity_table, io_config=io_config) |
Hi @datanikkthegreek , I am on Daft v0.4.1 and the below code block is all I need to load an 'external' table. import daft
from daft.unity_catalog import UnityCatalog
import os
# Set up your 'adb-......databricks.net' workspace URL as env var and a personal access token (PAT)
DATABRICKS_HOST_AZURE = os.environ.get('DATABRICKS_HOST_AZURE')
PAT_TOKEN_AZURE = os.environ.get('PAT_TOKEN_AZURE')
unity = UnityCatalog(endpoint=DATABRICKS_HOST_AZURE,token=PAT_TOKEN_AZURE)
unity_table_ext = unity.load_table("some_uc_catalog.some_schema.some_table") # This is an external table
df_ext = daft.read_deltalake(unity_table_ext)
df_ext.show() I noticed that your error seems to indicate it to be a response from an AWS control plane, when you seem to be attempting to access an Azure Databricks control plane, so something may be off in your env vars setup.
As for this above issue, this is unfortunately an issue from the Hope this helps and feel free to share more of the error text if you still have the issue. |
@colin-ho yes, thats what I also thought. And it worked. It's definitely sth with the unitcatalogue client. I created some simple functions around the REST API myself. I feel the unity clients are a bit overengineered and hard to use. Even though I am not a Python expert. I see you internally also use the client in Daft. But the old one. WORKSPACE = "WORKSPACE"
TOKEN = "YOUR TOKEN"
HOST=WORKSPACE.rstrip("/") + "/api/2.1/unity-catalog/"
DEFAULT_HEADERS = {"Authorization": f"Bearer {TOKEN}"}
def get_table(tbl_name):
url = f"{HOST}/tables/{tbl_name}"
response = requests.get(url, headers=DEFAULT_HEADERS)
return response.json()
get_table("pt_dh_sand.test.test_table")
def get_table_id(tbl_name):
return get_table(tbl_name)["table_id"]
get_table_id("pt_dh_sand.test.test_table4")
def get_tbl_path(tbl_name):
return get_table(tbl_name)["storage_location"]
get_tbl_path("pt_dh_sand.test.test_table4")
def get_table_credentials(tbl_name, operation = "READ"):
table_id = get_table_id(tbl_name)
url = f"{HOST}/temporary-table-credentials"
body = {
"operation": operation, #READ_WRITE to write from outside only for external tables
"table_id": table_id,
}
response = requests.post(url, json=body, headers=DEFAULT_HEADERS)
return response.json()["azure_user_delegation_sas"]
get_table_credentials("pt_dh_sand.test.test_table")
from daft.io import IOConfig, AzureConfig
import daft
tbl_name = "pt_dh_sand.test.test_table4"
azure = AzureConfig(sas_token=get_table_credentials(tbl_name)["sas_token"])
io_config = IOConfig(azure=azure)
df = daft.read_deltalake(get_tbl_path(tbl_name), io_config=io_config)
df.show()
#Write with Daft
azure = AzureConfig(sas_token=get_table_credentials(tbl_name, operation="READ_WRITE")["sas_token"])
io_config = IOConfig(azure=azure)
df.write_deltalake(get_tbl_path(tbl_name), mode="overwrite", io_config=io_config) |
Thanks for the detailed reply. The issue with the managed tables I also realized. There are two options. Set it as READ or parameterize it also allows writing the tables. I think you can also easily check if it's managed or not with the tables API. The error is really strange, It's definitely Azure on my side. As I am running this on Databricks I could not really change sth. In my previous response you can also see that using the rest API everything works. I find the REST API more convenient and understandable than the python clients the new and old one. You were right it works now. I tested daft with a managed table before. External table works. As you proposed. This can be easily fixed by making it READ or parameterize it. |
I ran into both of these issues using Databricks on AWS. I wanted to read managed tables and write external tables but no version of Daft supports both. IMHO the code should make fewer assumptions and let the caller specify their intent. |
Sorry folks, we understand the databricks/Unity experience has been less than ideal so far. We'll work with the databricks Unity team to try and iron out some more of these issues! @datanikkthegreek and @pmogren am I right to understand that all the issues in this PR can be tracked down to Daft support for managed tables in Unity Catalog, and that Daft currently works fine for external tables? |
Hi @jaychia , I was just reviewing and responding back and wanted to tag you and @kevinzwang for some thoughts on how we solve this. The reason this is an issue is that Unity does not support vending Error:
The piece of code responsible for this behavior is : https://github.com/anilmenon14/Daft/blob/main/daft/unity_catalog/unity_catalog.py#L140-L142 IMO we have 2 ways to solve this :
If you think 1 is a better approach, happy to help contribute |
@jaychia For me it's three issues:
Currently for me it's easier to use the Delta lake API from daft instead of unity catalogue and I am calling the unity rest API by myself. I also realise you don't use the new Unity pypi package. |
Confirmed by forking the project and implementing that approach #1, I was able to read a managed table without error. |
Yeah we actually built both Unity pypi packages -- the databricks folks didn't like our first one because we used a tool called Stainless, so we made a new one but haven't yet moved over 😀 @pmogren any chance you'd like to open a PR for your approach? We'd love to take a contribution! |
@jaychia Yes I'll put together a PR, sorry for the delayed response. |
Describe the bug
When running the code described here: https://www.getdaft.io/projects/docs/en/stable/user_guide/integrations/unity-catalog.html.
I deleted my databricks link from the error.
I am getting the following error
To Reproduce
Run this code
Expected behavior
Listing the catalogs
Component(s)
Python Runner
Additional context
No response
The text was updated successfully, but these errors were encountered: