Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving from Old Language Metadata Structure to Support Sub-languages and Simplified JSON #402

Merged
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
624760d
Simplified language metadata JSON by removing unnecessary nesting and…
OmarAI2003 Oct 12, 2024
05ba79d
Refactored _load_json function to handle simplified JSON structure.
OmarAI2003 Oct 12, 2024
7be7005
Refactor language metadata structure: Include all languages with Norw…
OmarAI2003 Oct 12, 2024
e1ce1d8
Refactor _find function to handle languages with sub-languages
OmarAI2003 Oct 12, 2024
046c78d
Update get_scribe_languages to handle sub-languages in JSON structure
OmarAI2003 Oct 12, 2024
7c00873
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 12, 2024
2233e44
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 13, 2024
8f737cd
Remove get_language_words_to_remove and get_language_words_to_ignore …
OmarAI2003 Oct 13, 2024
9f75f54
Refactor language_map and language_to_qid generation to handle new JS…
OmarAI2003 Oct 13, 2024
6186be9
Fix: Update language extraction to match new JSON structure by removi…
OmarAI2003 Oct 13, 2024
1c959ec
Refactor language extraction to use direct keys from language_metadata.
OmarAI2003 Oct 13, 2024
458328e
Added format_sublanguage_name function to format sub-language names a…
OmarAI2003 Oct 14, 2024
e017760
Refactor: Apply format_sublanguage_name to handle sub-language
OmarAI2003 Oct 14, 2024
4705414
Removed dependency on the 'languages' key based on the old json struc…
OmarAI2003 Oct 14, 2024
ab7b6cf
Add function to list all languages from language metadata loaded json
OmarAI2003 Oct 14, 2024
8d8f8f5
Refactor to use list_all_languages function for language extraction
OmarAI2003 Oct 14, 2024
d9a649b
Enhance language handling by importing utility functions
OmarAI2003 Oct 14, 2024
30f97e9
Update get_language_iso function:
OmarAI2003 Oct 14, 2024
ceec187
Handle sub-languages in language table generation
OmarAI2003 Oct 14, 2024
5345c08
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 14, 2024
540e9d2
adding new languages and their dialects to the language_metadata.json…
OmarAI2003 Oct 14, 2024
f389ab5
Modified the loop that searches languages in the list_data_types func…
OmarAI2003 Oct 14, 2024
09944ed
Capitalize the languages returned by the function 'format_sublanguage…
OmarAI2003 Oct 14, 2024
f602f17
Implemented minor fixes by utilizing the format_sublanguage_name func…
OmarAI2003 Oct 14, 2024
ba0ed9a
Updated the instance variable self.languages in ScribeDataConfig to u…
OmarAI2003 Oct 15, 2024
c77cb1f
adding mandarin as a sub language under chinese and updating some qids
OmarAI2003 Oct 16, 2024
87ec3b0
Update test_list_languages to match updated output format
OmarAI2003 Oct 16, 2024
84f8a4b
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 16, 2024
881c055
removing .capitalize method since it's already implemented inside lag…
OmarAI2003 Oct 16, 2024
15a13fb
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 16, 2024
fed80b3
Updating test cases in test_list.py file to match newly added languages
OmarAI2003 Oct 16, 2024
e6140e5
Update test cases to include sub-languages
OmarAI2003 Oct 16, 2024
22791ce
Updated the get_language_from_iso function to depend on the JSON file…
OmarAI2003 Oct 16, 2024
1416134
Add unit tests for language formatting and listing:
OmarAI2003 Oct 16, 2024
ca9edb4
Merge remote-tracking branch 'upstream/main' into refactor-languages_…
OmarAI2003 Oct 16, 2024
f3426f1
Merge branch 'scribe-org:main' into refactor-languages_metadata.json-…
OmarAI2003 Oct 17, 2024
661b131
Edits to language metadata and supporting functions + pr checklist
andrewtavis Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactor language extraction to use direct keys from language_metadata.
Removed dependency on the 'languages' key in JSON structure.
OmarAI2003 committed Oct 13, 2024
commit 1c959ec5d89f4d24e1f9f33f70b9e9a3289e86a8
2 changes: 1 addition & 1 deletion src/scribe_data/wikidata/query_data.py
Original file line number Diff line number Diff line change
@@ -115,7 +115,7 @@ def query_data(
SCRIBE_DATA_SRC_PATH / "language_data_extraction"
)
languages = [lang.capitalize() for lang in languages]
current_languages = list(language_metadata["languages"])
current_languages = list(language_metadata.keys())
current_data_type = ["nouns", "verbs", "prepositions"]

# Assign current_languages and current_data_type if no arguments have been passed.