-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need to SELECT 1 before each query? #198
Comments
The whole auth chain adds quite a few request round-trips. One think I'd like (#161) is to have a |
Also in #161, I'd like to cache the credentials after the first call in a global |
Auth takes a long time! googleapis/google-auth-library-python#285 At the moment, the latency overhead basically means we can't use BQ for our shorter queries (5-10k rows) which we've traditionally run off MSSQL. How much work do you think caching the creds would be? Given that the non-query part is 75% of the query time, it would be impactful |
Not much work, honestly. I think we can follow the same model as google.cloud.bigquery.magics with a similar Context class to cache the project_id and credentials. |
I think we can close this - now it's a single call per session? |
Trim pydata-google-auth package and add tests This is the initial version of the proposed pydata-google-auth package (to be used by pandas-gbq and ibis). It includes two methods: * `pydata_google_auth.default()` * A function that does the same as pandas-gbq does auth currently. Tries `google.auth.default()` and then falls back to user credentials. * `pydata_google_auth.get_user_credentials()` * A public `get_user_credentials()` function, as proposed in googleapis/python-bigquery-pandas#161. Missing in this implementation is a more configurable way to adjust credentials caching. I currently use the `reauth` logic from pandas-gbq. I drop `try_credentials()`, as it makes less sense when this module might be used for other APIs besides BigQuery. Plus there were problems with `try_credentials()` even for pandas-gbq (googleapis/python-bigquery-pandas#202, googleapis/python-bigquery-pandas#198).
With the recent clustering release, we're looking at whether we can use BQ for queries which previously required an indexed DB.
For these, the overhead is more important than the throughput. The overhead is high - in the case that the query is cached (if not, add 1s to all these, for a 700 row query on a small cluster):
pandas-gbq
%timeit -n 1 -r 1 gbq.read_gbq(query)
The latter two are particularly far apart. One of the reasons for this is that we issue a
SELECT 1
query before each query the user submits. This happens as part of_try_credentials
: https://github.com/pydata/pandas-gbq/blob/e753cc4e61b0089dcea6330eb8c77e93a62984a7/pandas_gbq/auth.py#L281Is this required? Could we instead attempt the query, and only attempt different auth methods if there's a failure?
The text was updated successfully, but these errors were encountered: