Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced QID search in translation. #545

Merged
merged 7 commits into from
Jan 11, 2025
Merged

Conversation

axif0
Copy link
Collaborator

@axif0 axif0 commented Jan 8, 2025

Contributor checklist


Description

Added QID search for total, translation and forms. Even a language is not present in language_metadata.json we can use QID to get total, translation and forms.

scribe-data get -l Q9217 -wdp scribe -od exported_json -a

image

For total

scribe-data total -l Q9217 -wdp scribe

image

@axif0 axif0 requested a review from andrewtavis January 8, 2025 11:52
Copy link

github-actions bot commented Jan 8, 2025

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@@ -124,39 +124,6 @@ def get_datatype_list(language):
return data_type_metadata


def check_qid_is_language(qid: str):
Copy link
Collaborator Author

@axif0 axif0 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to move the check_qid_is_language method in utils.py for ImportError.

image

Copy link
Member

@wkyoshida wkyoshida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

src/scribe_data/wiktionary/parse_dump.py Outdated Show resolved Hide resolved
Comment on lines 766 to 768
instance_of_values = request_result["statements"]["P31"]
for val in instance_of_values:
if val["value"]["content"] == "Q34770":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Coming across this..

was actually wondering if we do have it documented somewhere what P31 or Q34770 refer to in Wikidata (and also all other important IDs that we reference in code too).

If not, should we? Was thinking that a quick markdown table could suffice really.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually - scratch that

It might be more helpful perhaps if that information is closer to where it's used in code.
So in-source comments instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does another metadata file make sense for this? That way we reference an object and get the QID from a human readable object key?

Copy link
Member

@wkyoshida wkyoshida Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does another metadata file make sense for this?

hmm could be 🤔

Were you thinking of something like this?

        # Instead of...
        # instance_of_values = request_result["statements"]["P31"]
        instance_of_property = wikidata["property"]["instance-of"]
        instance_of_values = request_result["statements"][instance_of_property]
        for val in instance_of_values:
            # Instead of...
            # if val["value"]["content"] == "Q34770":
            if val["value"]["content"] == wikidata["entity"]["language"]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of this, yes :)

response = requests.get(api_endpoint)
data = response.json()
try:
return data["entities"][qid]["claims"]["P305"][0]["mainsnak"]["datavalue"][
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... was actually wondering if we do have it documented somewhere what P31 or Q34770 refer to in Wikidata (and also all other important IDs that we reference in code too) ...

ditto

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes here, @axif0! This will make it much easier to figure out when we have enough translations on Wikidata to then make the switch of the translation functionality viable 😊

@andrewtavis andrewtavis merged commit 9bbc0c6 into scribe-org:main Jan 11, 2025
6 checks passed
@andrewtavis
Copy link
Member

Quick note, @wkyoshida, that the above changes were made with @axif0 in the call and we discussed them while we prepared this for merging :) Just so you know that we did actively discuss the feedback and I didn't just do it myself 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants