Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries sometimes return no rows #6667

Open
nadove-ucsc opened this issue Oct 29, 2024 · 1 comment
Open

Queries sometimes return no rows #6667

nadove-ucsc opened this issue Oct 29, 2024 · 1 comment
Assignees
Labels
orange [process] Done by the Azul team

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Oct 29, 2024

During the reindex of the nadove6 deployment on 10/28/2024, this query returned no rows, raising a RequirementError:

@requestId | cb0ccc8b-df5e-5d6a-a7db-9311068e629e
@timestamp | 1730205653743
job_id | a23a055e-eed3-4fd2-b567-7ad1878b3cb1
query | SELECT anatomical_site, apriori_cell_type, biosample_id, biosample_type, datarepo_row_id, disease, donor_age_at_collection_lower_bound, donor_age_at_collection_unit, donor_age_at_collection_upper_bound, donor_id, part_of_dataset_id, source_datarepo_row_ids                FROM `datarepo-7268c3a0.ANVIL_ccdg_broad_ai_ibd_daly_kupcinskas_gsa_20240311_ANV5_202403121627.anvil_biosample`                WHERE biosample_id IN ('2d4fd521-b5a2-315c-e269-d44ccd845faf')
stats.searchStatistics.indexUnusedReasons.0.code | NOT_SUPPORTED_IN_STANDARD_EDITION
stats.searchStatistics.indexUnusedReasons.0.message | Index can not be used for query with Standard edition reservation. See https://cloud.google.com/bigquery/docs/editions-intro for more information.
stats.searchStatistics.indexUsageMode | UNUSED
total_rows | 0

The subsequent retry succeeded. It was reported as a cache hit, despite the result being different:

@requestId | 0a40bf45-815b-5ef3-88f2-e0df5ba5f012
@timestamp | 1730205930681
job_id | a785ecdc-d637-4dd0-a552-22ca650d50e9
query | SELECT anatomical_site, apriori_cell_type, biosample_id, biosample_type, datarepo_row_id, disease, donor_age_at_collection_lower_bound, donor_age_at_collection_unit, donor_age_at_collection_upper_bound, donor_id, part_of_dataset_id, source_datarepo_row_ids                FROM datarepo-7268c3a0.ANVIL_ccdg_broad_ai_ibd_daly_kupcinskas_gsa_20240311_ANV5_202403121627.anvil_biosample`                WHERE biosample_id IN ('2d4fd521-b5a2-315c-e269-d44ccd845faf')
stats.cacheHit | 1
stats.searchStatistics.indexUnusedReasons.0.code | QUERY_CACHE_HIT
stats.searchStatistics.indexUnusedReasons.0.message | Search indexes are not used because the query was cached.
stats.searchStatistics.indexUsageMode | UNUSED
stats.totalBytesBilled | 0
stats.totalBytesProcessed | 0
total_rows | 1

Errors like this one happened 64 times during the reindex. Every time this exception was raised, the row count was zero (there were no occurrences of an incomplete but nonempty result).

@nadove-ucsc nadove-ucsc added the orange [process] Done by the Azul team label Oct 29, 2024
@nadove-ucsc
Copy link
Contributor Author

Note that the first query (which returned no rows) is lacking the stats.totalBytesBilled and stats.totalBytesProcessed fields. This might indicate that we're reading the rows too soon, before the query has completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

2 participants