-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-country translations in taxonomies #880
Comments
I checked the language tag documentation: https://www.rfc-editor.org/rfc/rfc5646.html#page-5, and it contains a more detailed structure than I expected. It can probably be hacked with extensions |
Interesting, and somewhat dissapointing... However, it now occurred to me that maybe it is wrong to try to extend the language in this way because, e.g. de-AT means "the kind of German used in Austria". But if Austria has, say, a different name for their pairlament than Germany does, this does not mean that they name the same parliament differently but that they are talking about a different thing. So it is not a question of the language at all but of what a term refers to. To put it another way, if a German person talks about the Austrian pairlament, they will use the term Austrians use, and not the term they would use for the German pairlament. If this logic holds, then we don't need to do anything with the languages, but rather referr fom the description of a category to the appropriate corpus or country. We even have something similar already:
We would then, of course, also need to allow several desciption for the same language, and, of course, still have to modify scripts. @matyaskopp , what do you think? |
Good point! Agree!
I have very often problems with Not sure if we don't want some multivalue attributes because some terms can be similar for two countries but different for the third one.
The second issue can be fixed by introducing a new taxonomy and using different IDs, something like <?xml version="1.0" encoding="UTF-8"?>
<taxonomy xmlns="http://www.tei-c.org/ns/1.0" xml:id="ParlaMint-domains" xml:lang="mul">
<desc xml:lang="en"><term>ParlaMint</term></desc>
<category xml:id="ParlaMint-CZ-domain">
<catDesc xml:lang="en"><term>Parliament of the Czech Republic</term></catDesc>
</category>
<!-- ... -->
</taxonomy> really not sure, just an idea... Or we can ignore multivalues and have duplicated translations in common taxonomy (probably safest option) = turn all translations into domain-specific, at least in problematic legislature taxonomy.
Yes, several descriptions, and also the setup of how the proper description/term is chosen. Should it be: first language-domain-specific, then language fall-back value if the translation is missing? |
I agree that
I wouldn't make the country/region element mandatory, as it is not needed for most countries, also there are taxonomies where this is not relevant at all (like NER) and we then have backward compatibility. As for using such descriptions: I'd propose that if How does this sound? |
well that sounds better. Maybe, we can skip the text form and add just <desc xml:lang="de"><country key="AT"/><term>Legislative1</term></desc>
<desc xml:lang="de"><country key="DE"/><term>Legislative2</term></desc> this would simply allows single translation for multiple countries: <desc xml:lang="de"><country key="AT"/><country key="DE"/><term>Legislative</term></desc> Not sure what is easier to maintain operations:
|
Yes, why not. Easier then to insert in any case, as it is not necessary to figure out the name of the country/region in the local language. On the other hand, it doesn't hurt to have the name, as what is output in "normal" contexts is just the term, so it makes the text content of the desc more informative.
My idea was to use country/region only for ambiguous cases, rather than having it everywhere (like in NER taxonomy...).
Hard to say (esp. as you were doing the coding for this) but my guess would be that whichever way we choose, the effort to implement this will be similar... |
Currently we have and support multilingual taxonomies but it can happen (esp. in the legislature taxonomy) that the translation of a term for one country differs from the translation for another, even though both use the same language (e.g. AT and DE).
For this reason the language of a translation should be - when needed - extended by the country code, e.g. de-AT vs. de-DE. For this, all the code dealing with
@xml:lang
(at least in connection with taxonomies, but ideally everyhere) would have to be revised, as well as the taxonomies themselves.This issue becomes even more relevant if we were to add country-specific hyperlinks to Wikipedia as part of the description for particular categories.
The text was updated successfully, but these errors were encountered: