Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#269 First iteration of Scribe-Data data contracts #293

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

andrewtavis
Copy link
Member

@andrewtavis andrewtavis commented Jan 5, 2025

Contributor checklist


Description

As discussed in #269 and in recent syncs, we need to set up data contracts that will determine how the end applications present the data coming in from Scribe-Data. This PR sends along my first ideas on them based on the call we had yesterday.

A short synopsis of this:

  • For Scribe-iOS all data values are hard coded into the application
  • The data might change on Wikidata's end, meaning that the labels and other information would change in Scribe-Data
    • Ex: More data being added, which as of now would require us to create a new app release to change the hard coded fields meaning that people would not have access to the new data till then
    • Ex: As of now - unlike other languages - Spanish has the feminine and masculine of nouns stored on the same lexeme
      • If this changes, then we'd then have to rework the entire way that data is accessed and the Spanish keyboard would break on the next data update requiring an update of the app

Discussed solution:

  • We want to make sure that data updates are completely separate from app updates
  • We create "data contracts" that are delivered with new data packs
    • Ex: User downloads Scribe-iOS/Android and wants the German keyboard
      • They get the German data and the German contract
      • Updates of the data would deliver the corresponding contract
  • Note that we'd need to set up tests for Scribe-Data whereby the data that should go into the contract is changed
    • Ex again for Spanish nouns
      • Say they're split and we don't have feminine and masculine on the same lexemes
      • We check the data outputs and see if the data that's required to fulfill the current contract is coming in
      • If not, we open an issue that says what parts of the contract aren't being fulfill and we then check to see how the data has changed

Feedback is very welcome on the above! 🙏

Related issue

Copy link

github-actions bot commented Jan 5, 2025

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Android rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflows within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR Commit Check

  • The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution
    - The contributor's name and icon in remote commits should be the same as what appears in the PR
    - If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Android repo (can be set with git config --global user.email "GITHUB_EMAIL")

Copy link

github-actions bot commented Jan 5, 2025

Code Coverage

Overall Project 0.59%

There is no coverage information present for the Files changed

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 5, 2025

Again the goal of this is that we get rid of as much of the "command variables" files from Scribe-iOS that are hard coded.

Noun gender annotation:

  • Is there a canonical gender column (there is for all languages except Spanish as of now)?
  • Check the singular column for the word and return it's gender
  • Also check if it's in the plural column
  • For Spanish we need to check to see if it's in the feminine or masculine singular columns given the contract

Plural command:

  • Check if word is in the columns that are the value(s) of numbers for "Already plural"
  • If not, check the key columns and return the respective value column

Conjugate command:

  • The conjugations are presented to the user in the order of the dictionary object
  • The title is what appears as the header in the UI
  • The keys for the conjugation sub objects are the labels of the conjugation field
  • The values are the value that's returned when the user presses the conjugation key
    • Return value from this column

Translate command:

  • Is ultimately the easiest as the plan is that we have a word and the data that we have for it is the contract to be fulfilled
  • Is the word just a noun, a verb, another type or more than one?
  • Then what are the options and their descriptions
    • Ex German translation: Book -> noun or verb? -> noun -> Buch (book) or Roman (novel)

CC @axif0 as well as this effects Scribe-Data work to come :)

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 5, 2025

@henrikth93 and @Jag-Marcel: You two having worked on iOS the most, basically mapping out files like Scribe-iOS/Keyboards/LanguageKeyboards/English/ENCommandVariables.swift. Not all of those files will be put into the contract, but everything that can change with the data :)

@angrezichatterbox
Copy link
Member

I feel the formatting is quite good for now. As far as the experimentation I have with finding the gender of a word. I think this contract makes it easier and proper column would be used for the word depending on the data available.

@angrezichatterbox
Copy link
Member

Could we have the naming to be capital for the language codes since we have the same format for the SQLite database naming as well ?

@andrewtavis
Copy link
Member Author

Coming back to this just now, @angrezichatterbox. I guess for now can we leave them as lower case? There's something about the Scribe-i18n JSONs being the lower case language code and these ones being upper case that just doesn't look right to me 🤔 That and that these files are an extension of the Scribe-Data metadata files that are also lower case and will be generated in Scribe-Data really pushes for as it is :)

Hope this makes sense! Thanks for the feedback and all the planning that went into this 😊

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 9, 2025

Merging this in 😊

@andrewtavis andrewtavis merged commit 4a3cb97 into main Jan 9, 2025
5 checks passed
@andrewtavis andrewtavis deleted the 269-data-contracts branch January 9, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants