-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardization of all the sound packs about String IDs and names #48
Comments
Yes, it would be useful, and I've already started on some of that ;) However, something cannot and will not change at the moment is the filenames, and whether system or non-system files. However, everything else is up for grabs. Ideally, there would be a single translation file, with all the languages in it, thus they must all have the same words/phrases/sounds. But to do this, I think English is the wrong starting place, as it doesn't have three variations of the same word depending on the gender of the speaker (which I think is what is happening with some of the other languages). Hm... unless, instead of it being 1000-1999 Czech it were 1000 109-female or whatever... i.e. break the "variations" out to a separate number range... 🤔 |
A couple of thoughts:
Yes. This is particularly important for sounds that are referred to by EdgeTX code.
Are you suggesting to start each new language / locale from a template? I like that idea @Schnuppi12.
From personal experience working on a number of localized systems: using English as the reference language for this is fine. The strings that are used in the context of EdgeTX will be translatable in any combination of language, locale and voice without issue (e.g. The speaker is often relevant when talking to or about people, but none of the EdgeTX strings is likely to be about people, not even the speaker themselves. In other words: it's common for languages to include rules or customs that modify speech when the speaker talks about themselves or to other people (e.g. if the other people are older, younger...). But that's not going to be relevant for the subset of text that's used in EdgeTX, where the sentences are essentially impersonal. Of course all voices will sounds different, but that doesn't affect what they can say or not, they'll say the same thing, only differently. Does that make sense? An important part is recognizing that the voice is a part of the identifier, but this is already implicit in how the releases are structured. Example: there isn't one
The pattern language ( The characteristics of the speaker: gender, age, whatever are already encapsulated in the voice itself (Azure uses names, like |
The first column is the unique ID number for an entry... the goal was to be able to go though all the translation files and ensure matching lines had the same ID, then it could be automatically merged into a single file by ID. If memory serves me correct, this was also a requirement for being able to use crowdin.
That isn't the issue. Some of the languages have system files that are referenced by number only, and that number changes depending on the language. Which then collides with the key/identifier requirement (or recommendation?) for crowdin CSV importation. Only the language and locale are unique at present, as no issue has been raised with voice/gender needing to be separated out. What is important to note here is this started out with a single voice for each language, and then more have been added on, with some kludge workarounds done as needed to prevent the need to move to a Azure specific syntax. |
I think I've seen some of this, around sounds for numbers in particular. (example) I agree then that it would make sense for the first colum (ID) to be the same across all languages/locales. I make a mental note that if the example code I linked above is indeed what you had in mind, then modifying existing files would require some coordination to avoid breaking the compatibility between sound pack and firmware. Keeping stability around the file paths and names seema desirable as well, as you mentioned above. I see no conflict between that and standardizing the IDs at some point. Thanks for taking the time to clarify @pfeerick! This makes sense to me. |
Ideally, a merger would have no breakage, as there would simply be gaps for languages without the need of a phrase. i.e. CZ is probably the worst offender here... that language file has both male, female and neutral phrases in it... so words like the suffix for |
Short version: Indeed. If the localization library in the firmware doesn't support variations, adding keys in the translation files is pointless (see long version for more nuance and what that support may look like). The number of keys / strings to translate should be the same in all languages either way (see example in the long version, notice how the variants are handled within the string / message / single key). Long version: This is a common localization difficulty. Trying to compose sentences on the fly is a difficult trade-off as soon as multiple languages are involved. I the cases where it's possible, though, the localization library that the firmware uses needs to support those variations, there is no way around that. The most typical example is probably the handling of plurals. In a single-language English application it may be sufficient to write something like (pseudo-code): switch count:
0: print("Updated no items.")
1: print("Updated one item.")
default: ("Updated {count} items.") English has two forms for The translated strings would look like this (to take the ICU message format as an example):
The localization library then is used like this (pseudo code): import some-localization-function as _
translated_string = _(itemUpdated, { count: 3 }) This example would produce If the localization tooling in the firmware doesn't support that, then I'd argue that there is little point in trying to compensate in the translation files by adding extra keys. (Too little, too late.) Now, I said that composing sentences if a difficult trade-off. Even the ICU message format has limitations, namely, only one variable can realistically be varying in any given string. In French, for example, you'd count items differently depending on their gender (because items do have a gender in French) so a string like: To make the French example clearer, this is what the strings could render to.
And to expand on it: the counting words themselves are different depending on the nature of what is being counted in Japanese and Chinese for example (so you don't use the same number words to count 2 cats and 2 computers). The way to get translations exactly right is to only translate full sentences. (So the entire context is known to the translator, e.g. what is being counted and how many pieces there are.) That's often impractical. Alternatively, some variations can be handled okay, like plurals, in a narrow set of cases which is usually sufficient as long as the strings are composed carefully. Now, if there is no support from the localization library, even that is unlikely to yield satisfactory results. I think it's a trade-off to consider carefully (complexity of the code vs how much better the translation really gets). A proper localization library does remove most of the complexity, and make the trade-off much easier to cut - but that's a conversation for the EdgeTX, and Buddy repos, not for this one, I think? For the sake of mentioning it: translating sentences fragments and piecing them together is a no-no. (Gladly, that's not something that I think is happening in the voice packs, because the sounds are very short phrases anyway.) |
I think this is totally manageable. The worst case scenario (which I don't find that bad personally) would be to keep the old IDs, add the new ones, and set up a deprecation period for the code to evolve towards using the new IDs, with a fallback to the old IDs. Then remove the old IDs only once we're comfortable telling people that version X of EdgeTX requires at least version Y of the voice packs (where the new IDs were introduced) and clean up the corresponding fallback code. And if we expect people to update their voice packs when they update EdgeTX, then maybe all the fallback dance is not needed? |
After browsing the sound files, because of implementing the sounds to certain switches, I realised, that we don't have the same sound files in every language directory. Also I noticed, that the String IDs aren't the same for the sounds in every language. For better implementation I think it would be useful to standardise each String ID to one sound / word / phrase.
Then we can start to make the sound files for each language with the same files and not only half of them available in some language packs.
Also the name of each wave file should be standardized. See the example below for that.
The best starting point would be to use the english_GB csv file as the main, checking if other languages maybe have additional sounds and adding the missing there.
Discussing here, how to name the files and then using this for every other csv file.
Two examples about what I mean about different IDs and also different naming of the wav files
The number 100
English
100 = 0100.wav -> ID 101
German
hundert = 0101.wav -> ID 103
The word AND
English
and = 0110.wav -> ID 111
German
und = 0105.wav -> ID 106
French
et = 0120.wav -> ID 121
Also I can understand, that different languages have things which maybe can not translate directly to each other, because of the different gramma and use etc.
So as a suggestion, we could use String ID 0-999 for the general things implemented for every language.
And special things für each languages like this,
1000-1999 Czech
2000-2999 German
...
Every future implemented language can be added after that.
The text was updated successfully, but these errors were encountered: