site  contact  subhomenews

Script internationalization by 't12s'

December 11, 2011 — BarryK
This is a brilliant technique developed by technosaurus and implemented for Puppy by L18L.

Forum discussion:
http://murga-linux.com/puppy/viewtopic.php?t=73440

Online documentation on how to do a translation:
http://extra-inter.net/puppy/t12s_DOC/t12s.html

I have put version 0.4.5.1 of 't12s' into /usr/sbin in Woof. I also created a help file /usr/share/doc/HOWTO-internationalization.htm that explains the three methods -- BaCon code, using 'gettext' in scripts, and using 't12s' in scripts.

The above two files are in Woof, which has been uploaded. Timeline:
http://bkhome.org/fossil/woof2.cgi/timeline

...do an anonymous login and you can view the source of 't12s' and also read the 'HOWTO-internationalization.htm' file rendered as a web page (not source code) -- a nice feature of Fossil.

I was going to put 'xdelta_gui' that L18L had created, that has the 't12s' method of translation, however when I tried to run it, it output this error message:

# ./xdelta_gui
./xdelta_gui: line 23: : command not found


...the application works, but that error worries me, so I did not put it into Woof.

So, so far there are no scripts using 't12s' method in Woof.

Comments

: command not found
Username: L18L
He he, I could reproduce this! http://extra-inter.net/puppy/xdelta_gui First change to: View>Character encoding>[b]Unicode[/b] Then copy it. Other characters are also affected by this. ----- I am going to review all scripts I have touched with t12s: askpass alsawizard xdelta_gui ashmusic and apply my own rules which have been changed meanwhile, ex: ${_M_17:-SUCCESS! ${SOURCEFILE_DELTA} generated} change to ${_M_17:-SUCCESS! $SOURCEFILE_DELTA generated}

xdelta_gui
Username: L18L
"2 translations on http://www.murga-linux.com/puppy/viewtopic.php?t=73440&start=42 and a typo in c ... and the pr[b]oga[/b]m will search ... +1 for t12s: no need to change translations !

typo
Username: L18L
"Sorry, the typo is in original xdelta_gui (not in c)

Variable substitution
Username: BarryK
"L18L, Regarding this example: [i] xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA} generated}"[/i] Sometimes the ${...} is necessary, for example: [i] xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA}extra generated}"[/i] As this would not work: [i] xmessage -bg green -center "${_M_17:-SUCCESS! $SOURCEFILE_DELTAextra generated}"[/i] So, are you imposing the restriction that $... variables must always have characters either side that Bash/Ash recognises as delimiters? -- such as space chars.

Embedded unicode character
Username: BarryK
"That worries me. This:  is an invisible multi-byte utf8 character on line 23. The problem is, nothing I do in Geany shows it. That is one thing that bothers me about utf and unicode, and utf/unicode-aware text editors, that an invisible character can be in the code, that the text editor refuses to display, yet it can break the script. Also, Geany refuses to convert the file from utf-8 to iso8859-1, without identifying just where the problem is.

Translate multi-byte chars
Username: BarryK
"Well, we have this problem in Puppy as many utilities (Busybox in particular) are not compiled to be multi-byte-character aware. Also, I run with non-utf8 locale. That 'xdelta_gui' is interesting, it actually has several multi-byte utf8 characters. Doing this replaces that line-23 character with nothing, and the other multi-byte chars with '?': [i]# iconv --verbose -f UTF8 -t ASCII//TRANSLIT < xdelta_gui >out.txt[/i] For example: [i] xmessage -bg red -center "${_M_4:-Error, ?Old file? does not exist}"[/i] Hmmm, I might have to check other scripts that L18L has internationalised, that are already in Woof!

t12s non-utf8
Username: L18L
"Sorry, I had not be thinking about anybody not using utf8. No, I have been thinking about this and there is a test on utf8 at the beginning of t12s. Did it not work for you? Anyway, I have repeated download of http://extra-inter.net/puppy/xdelta_gui and this time I used File>Save Page as... No problem for me, as I am using utf8. geany is showing: File>Properties Encoding: UTF-8 (without BOM) line 23 is empty. Another multi-byte char is the apostrophe [b]´[/b]. ?Old file? should be ´Old file´ That is coming from my idea to change single quotes to apostrophes [b]before[/b] I had the idea to just escape them. Now I have learned that apostrophes are not ASCII. And you can change them safely back to single quotes, existing translations will not be affected by this change.

xdelta_gui changed
Username: L18L
"I have changed the apostrophes back to single quotes deleted empty lines but  stays! For me, it works English and German. Please try http://extra-inter.net/puppy/xdelta_gui.changed Change Character encoding unicode in seamonkey

other scripts
Username: L18L
"[i]Hmmm, I might have to check other scripts that L18L has internationalised, that are already in Woof! [/i] Other scripts are simple, without variables. So please just name a script (containing variable messages) and I will do my very best to i18n it. ___________ Back to your example: xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA}extra generated}" would have to become (Technosaurus has noted): $SFDELTAx="${SOURCEFILE_DELTA}extra" [ -f $LOCALES ] && . $LOCALES xmessage -bg green -center "${_M_:-SUCCESS! $SFDELTAx generated}" (remember: replace xmessage by a utf8 capable tool) ________________

xdelta_gui uploaded
Username: BarryK
"L18L, This operation: [i]# iconv -f UTF-8 -t ASCII//TRANSLIT < xdelta_gui > xdelta_guiOUT[/i] deletes that invalid multi-byte character on line 23. The apostrophes get converted to '?'. I restored the '&#8595;' multi-byte character. I have fixed the script and added it to Woof. It is uploaded, see timeline commit '54912365f0' 2011-12-12: http://bkhome.org/fossil/woof2.cgi/timeline I have not yet added any translations. Note, the apostrophes that you put into xdelta_gui are actually [b]not[/b] a problem. The only problem was the invalid multi-byte character on line 23. Having multi-byte characters in a text string that for example gtkdialog will display, then if gtkdialog can handle that it is okay. However, 'echo' strings would probably have to stay as ascii only, as I have not yet configured Busybox as multi-byte aware (I did experiment with that awhile back but rolled back). Generally, for English strings in scripts, it is probably best to keep them ascii only unless there is a particular need not to. Note, there is one character in xdelta_gui that is still remaining as multi-byte -- '&#8595;' -- see line 154. This string is handled by gtkdialog.

'&#8595;
Username: L18L
"[i]Note, there is one character in xdelta_gui that is still remaining as multi-byte -- '&#8595;' -- see line 154. This string is handled by gtkdialog.[/i] Yes, it is the original "arrow down". #!/bin/ash echo &#8595; urxvt: # ./arrow_test &#8595; # Is working in real console too I am going to test it in initrd (busybox) now. PS using iconv has helped me to convert output of automated translation to UTF-8. No problem with xdelta-gui fixed. It works for me

&#8595; again
Username: L18L
"[i]I am going to test it in initrd (busybox) now. [/i] http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27

&#8595; one more time
Username: L18L
"Sorry, delete above link please and take this one: http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27

Link
Username: BarryK
"http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27

echo multi-byte
Username: BarryK
"L18L, Busybox in the initrd is not multi-byte aware: [code]# CONFIG_LOCALE_SUPPORT is not set # CONFIG_UNICODE_SUPPORT is not set # CONFIG_UNICODE_USING_LOCALE is not set # CONFIG_FEATURE_CHECK_UNICODE_IN_ENV is not set CONFIG_SUBST_WCHAR=0 CONFIG_LAST_SUPPORTED_WCHAR=0 # CONFIG_UNICODE_COMBINING_WCHARS is not set # CONFIG_UNICODE_WIDE_WCHARS is not set # CONFIG_UNICODE_BIDI_SUPPORT is not set # CONFIG_UNICODE_NEUTRAL_TABLE is not set # CONFIG_UNICODE_PRESERVE_BROKEN is not set[/code] I have just experimented with the echo in that busybox, does not display the down-arrow correctly. So, I don't know how you are getting it to display, unless you are using a different initrd busybox than the default one in Woof, that is unicode-enabled.

re: echo multi-byte
Username: L18L
"One of my changes in init to minimize the "lopside" of i18n http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=25 # i18n zcat /lib/consolefonts/LatGrkCyr-8x16.psfu.gz | loadfont # All European languages; new default ?!

t12s
Username: L18L
"Posting this from a neighbour now because I am without internet connection at home. Making progress on testing automated translations without internet. Hope to get back online sooner or later.

 SIGN
Username: K Godt
"[b][/b] Just came accidentally across and want to say that i encountered that SIGN too It seems to be a single 'space' . ffmpeg had been complaining about it in a script and i have replaced spaces with TABs in the script . like ffmpeg\ [TAB]-option\ [TAB]-i infile\ [TAB]/outfile

Could it be a BOM?
Username: zygo
"M$ strikes again http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark

Re BOM
Username: BarryK
"Zygo, The is very educational. Thus, the presence of  in L18L's scripts means that he has been editing them in a MS Windows text editor! Aaaargh!

corrupt
Username: zygo
"Worse still the file is corrupt! I test webpages on ie and amend them too then in Puppy I edit with Scite which would show this UTF-8 BOM. But it isn't in my files. Probably because the files have only 7-bit chars.


Tags: woof