Do we need to SELECT 1 before each query? #198

max-sixty · 2018-08-21T18:16:19Z

With the recent clustering release, we're looking at whether we can use BQ for queries which previously required an indexed DB.

For these, the overhead is more important than the throughput. The overhead is high - in the case that the query is cached (if not, add 1s to all these, for a 700 row query on a small cluster):

73ms: query time reported by BQ
1160ms: query time logged by pandas-gbq
3100ms: wall time reported by %timeit -n 1 -r 1 gbq.read_gbq(query)

The latter two are particularly far apart. One of the reasons for this is that we issue a SELECT 1 query before each query the user submits. This happens as part of _try_credentials: https://github.com/pydata/pandas-gbq/blob/e753cc4e61b0089dcea6330eb8c77e93a62984a7/pandas_gbq/auth.py#L281

Is this required? Could we instead attempt the query, and only attempt different auth methods if there's a failure?

The text was updated successfully, but these errors were encountered:

tswast · 2018-08-21T19:33:00Z

Is this required?

The whole auth chain adds quite a few request round-trips. One think I'd like (#161) is to have a credentials argument (or possibly a client argument) which would bypass all of those requests.

tswast · 2018-08-21T19:33:56Z

Also in #161, I'd like to cache the credentials after the first call in a global Context object to avoid subsequent unnecessary auth calls.

max-sixty · 2018-08-24T18:59:15Z

Auth takes a long time! googleapis/google-auth-library-python#285

At the moment, the latency overhead basically means we can't use BQ for our shorter queries (5-10k rows) which we've traditionally run off MSSQL.

How much work do you think caching the creds would be? Given that the non-query part is 75% of the query time, it would be impactful

tswast · 2018-08-25T15:04:16Z

Not much work, honestly. I think we can follow the same model as google.cloud.bigquery.magics with a similar Context class to cache the project_id and credentials.

max-sixty · 2018-09-05T18:25:38Z

I think we can close this - now it's a single call per session?

Trim pydata-google-auth package and add tests This is the initial version of the proposed pydata-google-auth package (to be used by pandas-gbq and ibis). It includes two methods: * `pydata_google_auth.default()` * A function that does the same as pandas-gbq does auth currently. Tries `google.auth.default()` and then falls back to user credentials. * `pydata_google_auth.get_user_credentials()` * A public `get_user_credentials()` function, as proposed in googleapis/python-bigquery-pandas#161. Missing in this implementation is a more configurable way to adjust credentials caching. I currently use the `reauth` logic from pandas-gbq. I drop `try_credentials()`, as it makes less sense when this module might be used for other APIs besides BigQuery. Plus there were problems with `try_credentials()` even for pandas-gbq (googleapis/python-bigquery-pandas#202, googleapis/python-bigquery-pandas#198).

max-sixty mentioned this issue Aug 24, 2018

Reduce verbosity of logging #201

Merged

telenieko mentioned this issue Aug 28, 2018

_try_credentials requires read permissions #202

Closed

tswast mentioned this issue Aug 31, 2018

[ENH] cache credentials in-memory #208

Merged

tswast closed this as completed Sep 5, 2018

tswast mentioned this issue Sep 7, 2018

Trim pydata-google-auth package and add tests pydata/pydata-google-auth#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need to SELECT 1 before each query? #198

Do we need to SELECT 1 before each query? #198

max-sixty commented Aug 21, 2018

tswast commented Aug 21, 2018

tswast commented Aug 21, 2018

max-sixty commented Aug 24, 2018

tswast commented Aug 25, 2018

max-sixty commented Sep 5, 2018

Do we need to SELECT 1 before each query? #198

Do we need to SELECT 1 before each query? #198

Comments

max-sixty commented Aug 21, 2018

tswast commented Aug 21, 2018

tswast commented Aug 21, 2018

max-sixty commented Aug 24, 2018

tswast commented Aug 25, 2018

max-sixty commented Sep 5, 2018