-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Shopify Metafields Sync Performance #200
base: master
Are you sure you want to change the base?
Conversation
c2905b1
to
faa9c2b
Compare
faa9c2b
to
3f26fb2
Compare
2238558
to
edd5b44
Compare
|
||
@shopify_error_handling | ||
def call_api(self, query_params, query, data_key): | ||
def call_api(self, query_params, query): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the base class implementation has more explicit error handling, it would be a good idea to carry it over to the modified implementation of the class.
ref: gql_base.py
@shopify_error_handling
def call_api(self, query_params):
"""
- Modifies the default call api implementation to support GraphQL
- Returns response Object dict
"""
try:
query = self.get_query()
LOGGER.info("Fetching %s %s", self.name, query_params)
response = shopify.GraphQL().execute(query=query, variables=query_params)
response = json.loads(response)
if "errors" in response.keys():
raise ShopifyAPIError(response['errors'])
data = response.get("data", {}).get(self.data_key, {})
return data
except ShopifyAPIError as gql_error:
LOGGER.error("GraphQL Error %s", gql_error)
raise ShopifyAPIError("An error occurred with the GraphQL API.") from gql_error
except Exception as exception:
LOGGER.error("Unexpected error occurred.",)
raise exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the exception handling over here as well.
|
||
def get_next_page_metafields(self, metafields, query_params, gql_resource): | ||
owner_id = metafields.get("edges", [])[0]["node"]["owner"]["id"] | ||
query = self.get_resource_type_query(gql_resource)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the right way to use a static method with a self reference
please change the method implementation or update the function call.
Metafields.get_resource_type_query(gql_resource)()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the function to the instance method.
tap_shopify/streams/metafields.py
Outdated
replication_value = utils.strptime_to_utc(obj["updated_at"]) | ||
current_bookmark_value = self.get_bookmark_by_name(f"{self.name}_{resource_type}") | ||
if resource_type not in current_bookmarks: | ||
current_bookmarks[resource_type] = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct but the function self.get_bookmark_by_name(f"{self.name}_{resource_type}")
seems to be called twice — once to set current_bookmark_value and once more when updating current_bookmarks.
Can't we write like this, instead?
current_bookmarks[resource_type] = current_bookmark_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, made the change
tap_shopify/streams/metafields.py
Outdated
if resource_type not in current_bookmarks: | ||
current_bookmarks[resource_type] = \ | ||
self.get_bookmark_by_name(f"{self.name}_{resource_type}") | ||
|
||
if replication_value >= current_bookmarks[resource_type]: | ||
current_bookmarks[resource_type] = max( | ||
replication_value, | ||
current_bookmarks[resource_type] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here,
- You're checking if resource_type exists in current_bookmarks. If not, you fetch the initial bookmark value from
self.get_bookmark_by_name(f"{self.name}_{resource_type}")
and store it incurrent_bookmarks[resource_type]
- You then check if
replication_value >= current_bookmarks[resource_type]
. If this condition is True, you update the bookmark for this resource_type by taking the maximum value between the current replication_value and the existingcurrent_bookmarks[resource_type]
.
Essentially, you're checking and possibly updatingcurrent_bookmarks[resource_type]
twice.
You can combine the check and update process in a single step by using the dict.get() method and reducing the redundancy:
if replication_value >= current_bookmarks.get(resource_type, current_bookmark_value):
current_bookmarks[resource_type] = max(replication_value, current_bookmarks.get(resource_type, replication_value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds logical.
I optimized it further.
if replication_value >= current_bookmarks.get(resource_type, current_bookmark_value):
current_bookmarks[resource_type] = replication_value
Let me know if you think something else.
|
||
LOGGER = get_logger() | ||
|
||
|
||
|
||
class Metafields(ShopifyGqlStream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not impacting anything at all, still these 2 empty lines should be there before class def.
@@ -1,5 +1,10 @@ | |||
# Changelog | |||
|
|||
## 2.1.0 | |||
* Optimize Shopify Metafields Sync Performance [#200](https://github.com/singer-io/tap-shopify/pull/200) | |||
* Include the new field (= inventoryItem) in the product variants schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Include the new field (= inventoryItem) in the product variants schema. | |
* Include new field inventoryItem in the product_variants stream. |
If this was not intended
Problem
The current implementation of metafields sync is inefficient because:
Solution
Implemented a hybrid approach that:
Implementation Details
Additional Changes
Benefits
Testing
Future Considerations
Related Issues
Closes #[TDL-26998]
AI generated code
https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code