site  contact  subhomenews

Simplifying translations

February 06, 2012 — BarryK
I have a question for non-English Puppians. I would like to simplify many of the gettext translations in Woof scripts, leaving out any XML or HTML or any other formatting, also as much as possible leaving out substitutions for example "${VAR1}".

I really do not like having formatting in a translation string, they should be simple text strings. That is my ideal anyway.

I would also like to avoid use of eval_gettext as much as possible, again because I would like the strings to be simple easy-to-understand plain text.

The biggest problem for the latter though, is the semantics of some non-English sentence construction, or ordering of the words. An English string:

"The script $FILE1 has changed"

I would like to translate this in two pieces:

"$(gettext 'The script') $FILE1 $(gettext 'has changed')"

I don't know much about non-English languages, but I understand that simply translating the above might not be very good for some languages, due to order of words having to be re-arranged.
Perhaps there is a better example of this problem than the above.

Anyway, I was wondering if I can "get away with it". Even if the translation is a bit strange. or, is it possible to work around the problem, change the wording in those two gettext translations such that the translated sentence is quite sensible?


LupuQ and Puppy-desktop
Username: BarryK
I wish to acknowledge the incredible amount of work that shinobar has done, creating an internationalized version of Lucid Puppy. Some Forum posts: I am starting to study shinobar's work, so as not to reinvent the wheel, not too much anyway :happy:

Puppy-desktop eval_gettext
Username: rodin.s
"I use shinobar's Puppy-desktop and fixmenus_on_locale in my langpacks. Eval_gettext is useful to see the context of the translation. The above example is OK without eval_gettext but in more complex cases it's better to use eval_gettext I guess. Or for example line: [code]Note2: These files are at ${MSGz} (in ${xHOMEPART} partition)[/code] looks simple with eval_gettext but if you do [code]$(gettext 'Note2: These files are at')${MSGz} ($(gettext'in') ${xHOMEPART} $(gettext 'partition'))[/code] it looks not so simple. Also similar words are put into one line in po-file and are not repeated. In Russian we have grammatical cases: so it's better to have a sentence to translate, not a single word, to understand context.
Username: rodin.s
"I have now encountered another example: line 245 in [code]$(gettext 'Finished. The packages have been downloaded to') /root $(gettext 'directory.')[/code] I Russian I have to put /root at the end of the sentence: "Finished. The packages have been downloaded to directory /root." So it would be better like this: [code]$(gettext 'Finished. The packages have been downloaded to /root directory.')[/code] It gives more freedom in translations.

NLS strings
Username: zigbert
"[code]"$(gettext 'The script') $FILE1 $(gettext 'has changed')" [/code] In some occasions this probably must be the structure, but I nearly always manage to express myself with: File has been changed: $FILE1 Now, there is no need of splitting the expression.

Username: shinobar
"Barry> I would also like to avoid use of eval_gettext as much as possible Me too. It is troublesome. I made eval_gettext as follows, but i am not sure it is right. [code]eval_gettext () { local myMESSAGE=$(gettext "$1") eval echo \"$myMESSAGE\" }[/code] "$(gettext 'The script') $FILE1 $(gettext 'has changed')" is one solution, but may not be easy-to-understand for translators. I usually use: $(printf "$(gettext "The script %s has changed")" $FILE1) It may be easy-to-understand for translators, but may be complex for the programmers. As rodin.s says: $(gettext "File has been changed"): $FILE1 Would be simple.

Multiligual Wary
Username: shinobar
"$(gettext "File has been changed"): $FILE1 Would be simple, as [b]zigbert[/b] says. By the way, my recent work based on Lupq can be seen at Multilingual Wary-511-01q.

translation practical(?)
Username: mave1
"Hi Barry, Your work on internationalization Puppy is great - thanks! Doing this "Simplifying" much better :-), but in my opinion with strings like [code]"$(gettext 'The script') $FILE1 $(gettext 'has changed')" [/code] we're running in trouble. Trying several translations I think, it's neccessary getting completely phrases: [code]"$(gettext 'The script $FILE1 has changed')" [/code], only this way I can handle mostly language specific elements. Otherwise translations will be funny or people might think an alien is talking to us ;-)

Username: argolance
" Before reading this thread till the end, I was thinking of something quite similar to zigbert and shinobar proposal: [code]$FILE1: $(gettext "File has been changed")[/code] Regards.

simply not simple
Username: L18L
"I do not think that it is good to enable just [i]a bit strange translations[/i]. Where is the problem with eval_gettext? I donīt see it. A translator will see just msgid "The script $FILE1 has changed" msgstr "" Translators are not the dumbest persons, they all have learned English (more or less)! Any [b]translator[/b] can and has to and will [b]learn[b] that a word starting with a $ is a variable that has to stay [b]unchanged[/b]. [b]Developer[/b]s can use [b]meaningful names[/b] for the variables. Why not trust the experts and RTFM? [i]Entire sentences are also important because in many languages, the declination of some word in a sentence depends on the gender or the number (singular/plural) of another part of the sentence. There are usually more interdependencies between words than in English. The consequence is that asking a translator to translate two half-sentences and then combining these two half-sentences through dumb string concatenation will not work, for many languages, even though it would work for English. That's why translators need to handle entire sentences. [/i] Quoted from:

i18n crash course
Username: j
"Late posting, but with some additional thoughts. First of all, as you prolly realize by now, I can confirm definitely that you cannot get away with avoiding embedded vars. Non-English languages aren't simply different words, with the same syntax -- sometimes they are *very* different, in syntax and especially idioms, thus the only way to correctly translate some english-sentence-concept is to write a completely different sentence, with significant re-ordering. Not only that, but you have to have a bilingual human translator do the work; computerized machine translation is totally infantile, as you can see by putting some English2Whatever into the various online websites that offer this service, and then feeding the answer back into the same site's Whatever2English 'translator' ... you get back total rubbish. In the ideal world, international 'ization' would simply look like this sort of thing: "$(gettext 'The') $(gettext 'script') $FILE1 $(gettext 'has') $(gettext 'changed')" ... and you would simply have a word2word lookup table that gave you back the French or Chinese or Russian equivalent of 'changed' and jammed that into the sentence. But human languages were evolved not designed, and thus don't work that way. Makes it harder on software devs, but that's the price you pay for success -- if nobody wanted to use Puppy in the Australian English version, then there would not be much demand for international translations. So consider it a good thing, and prepare to use embedded vars as much as possible, to give the translation-team as much help as you can. (Sorry I cannot help there... English and a few tourist phrases are the extent of my bilingualism... if you need Woof translations into LISP or PHP or somesuch, then maybe I can lend a hand.) [to be cont'd]

i18n crash course
Username: j
"[cont'd] Which brings me to my second thought: putting a large application on an i18n basis is definitely hard work. Besides internationalizing the basic strings -- which means going through and adding all the gettext stuff, plus getting translators to help, and then testing (manually unfortunately) that everything you changed has worked, not broken something by accident. In the meantime, all your dev work is going towards i18n efforts, so you don't get many new features besides the i18n itself. However, once you get it done, your life is *much* better as a dev, because you don't have to worry about translating-by-completely-starting-over any more. As long as you are disciplined about always using gettext in new code, translations will be straightforward for the translation-team to do, mostly without dev help. For that to be true, though, you have to watch out for other i18n gotchas. Make sure there are no icons/pics/whatnot with hardcoded english in them, and also that the graphical ones can be grokked by other cultures. Make sure you aren't parsing raw output of smem in your scripts (mentioned in a previous blog post here -- definitely use /proc and the other guaranteed-to-be-programmatic-by-Linus stuff yourself... and make sure other people writing scripts or patches also know the rules). Unicode support is tricky -- on Linux that means using UTF8 for everything pretty much by necessity -- but *not* supporting it is even worse. [to be cont'd... one more time... 2kb limit?]

i18n crash course
Username: j
"[cont'd] I recommend this old 1999 book for a solid overview of the problems and solutions; while it is out of date on specifics, most of the things it covers are conceptual hurdles (like whether you can split sentences or word-by-word translate) that tripped *me* up too, before I read it. There is a 2nd edition from 2009 on sale now, which is prolly just as good (plus modernized) although I haven't seen it and thus cannot verify that personally. The review-blurb has "Linux" in the list of hot topics, if that helps. (No, not Puppy Linux specifically... grin... maybe in the 3rd edition.) Note that the book isn't just about Chinese et al; the reason he puts the focus there is because once you internationalize your app to handle CJK[V], adding 99% of all languages is easy. The exceptions are Hebrew and Arabic, since they order text right-to-left, and Thai which has very complex joining-rules (some of the India-languages are also tough like that). Hope this helps. :: j

Tags: woof