Script internationalization by 't12s'

This is a brilliant technique developed by technosaurus and implemented for Puppy by L18L.

Forum discussion:
http://murga-linux.com/puppy/viewtopic.php?t=73440

Online documentation on how to do a translation:
http://extra-inter.net/puppy/t12s_DOC/t12s.html

I have put version 0.4.5.1 of 't12s' into /usr/sbin in Woof. I also created a help file /usr/share/doc/HOWTO-internationalization.htm that explains the three methods -- BaCon code, using 'gettext' in scripts, and using 't12s' in scripts.

The above two files are in Woof, which has been uploaded. Timeline:
http://bkhome.org/fossil/woof2.cgi/timeline

...do an anonymous login and you can view the source of 't12s' and also read the 'HOWTO-internationalization.htm' file rendered as a web page (not source code) -- a nice feature of Fossil.

I was going to put 'xdelta_gui' that L18L had created, that has the 't12s' method of translation, however when I tried to run it, it output this error message:

# ./xdelta_gui
./xdelta_gui: line 23: : command not found


...the application works, but that error worries me, so I did not put it into Woof.

So, so far there are no scripts using 't12s' method in Woof.


Posted on 11 Dec 2011, 8:29


Comments:

Posted on 12 Dec 2011, 3:24 by L18L
: command not found
He he, I could reproduce this!

http://extra-inter.net/puppy/xdelta_gui

First change to:
View>Character encoding>Unicode

Then copy it.

Other characters are also affected by this.
-----
I am going to review all scripts I have touched with t12s:
askpass
alsawizard
xdelta_gui
ashmusic

and apply my own rules which have been changed meanwhile, ex:
${_M_17:-SUCCESS! ${SOURCEFILE_DELTA} generated}
change to
${_M_17:-SUCCESS! $SOURCEFILE_DELTA generated}




Posted on 12 Dec 2011, 4:36 by L18L
xdelta_gui
2 translations on
http://www.murga-linux.com/puppy/viewtopic.php?t=73440&start=42

and a typo in c
... and the progam will search ...

+1 for t12s: no need to change translations !


Posted on 12 Dec 2011, 4:38 by L18L
typo
Sorry,
the typo is in original xdelta_gui (not in c)


Posted on 12 Dec 2011, 6:28 by BarryK
Variable substitution
L18L,
Regarding this example:

xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA} generated}"

Sometimes the ${...} is necessary, for example:

xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA}extra generated}"

As this would not work:

xmessage -bg green -center "${_M_17:-SUCCESS! $SOURCEFILE_DELTAextra generated}"

So, are you imposing the restriction that $... variables must always have characters either side that Bash/Ash recognises as delimiters? -- such as space chars.



Posted on 12 Dec 2011, 7:01 by BarryK
Embedded unicode character
That worries me. This:


is an invisible multi-byte utf8 character on line 23.

The problem is, nothing I do in Geany shows it.

That is one thing that bothers me about utf and unicode, and utf/unicode-aware text editors, that an invisible character can be in the code, that the text editor refuses to display, yet it can break the script.

Also, Geany refuses to convert the file from utf-8 to iso8859-1, without identifying just where the problem is.



Posted on 12 Dec 2011, 8:05 by BarryK
Translate multi-byte chars
Well, we have this problem in Puppy as many utilities (Busybox in particular) are not compiled to be multi-byte-character aware. Also, I run with non-utf8 locale.

That 'xdelta_gui' is interesting, it actually has several multi-byte utf8 characters. Doing this replaces that line-23 character with nothing, and the other multi-byte chars with '?':

# iconv --verbose -f UTF8 -t ASCII//TRANSLIT < xdelta_gui >out.txt

For example:

xmessage -bg red -center "${_M_4:-Error, ?Old file? does not exist}"

Hmmm, I might have to check other scripts that L18L has internationalised, that are already in Woof!



Posted on 12 Dec 2011, 12:29 by technosaurus
translating enclosed vars
You can still use curly brackets if you just do a separate VAR for before and after text... The script that breaks out the strings for Google translate used "$" as line breaks (though I now realize it may have been better to use "_" ) ... But then Google will try and translate the enclosed VAR. ... Still may not be perfect using before and after text b/c of rtl languages. Has anyone tested any of these sinister languages?


Posted on 12 Dec 2011, 20:36 by L18L
t12s non-utf8
Sorry, I had not be thinking about anybody not using utf8.
No, I have been thinking about this and there is a test on utf8 at the beginning of t12s. Did it not work for you?

Anyway, I have repeated download of
http://extra-inter.net/puppy/xdelta_gui
and this time I used
File>Save Page as...
No problem for me, as I am using utf8.
geany is showing:
File>Properties Encoding: UTF-8 (without BOM)
line 23 is empty.

Another multi-byte char is the apostrophe .
?Old file? should be Old file

That is coming from my idea to change single quotes to apostrophes before I had the idea to just escape them.

Now I have learned that apostrophes are not ASCII.
And you can change them safely back to single quotes, existing translations will not be affected by this change.







Posted on 12 Dec 2011, 20:58 by L18L
xdelta_gui changed
I have changed the apostrophes back to single quotes
deleted empty lines but

stays!

For me, it works English and German.
Please try
http://extra-inter.net/puppy/xdelta_gui.changed
Change Character encoding unicode in seamonkey


Posted on 12 Dec 2011, 21:29 by L18L
other scripts
Hmmm, I might have to check other scripts that L18L has internationalised, that are already in Woof!

Other scripts are simple, without variables.
So please just name a script (containing variable messages) and I will do my very best to i18n it.

___________
Back to your example:
xmessage -bg green -center "${_M_17:-SUCCESS! ${SOURCEFILE_DELTA}extra generated}"
would have to become (Technosaurus has noted):
$SFDELTAx="${SOURCEFILE_DELTA}extra"
[ -f $LOCALES ] && . $LOCALES
xmessage -bg green -center "${_M_:-SUCCESS! $SFDELTAx generated}"

(remember: replace xmessage by a utf8 capable tool)
________________






Posted on 13 Dec 2011, 8:26 by BarryK
xdelta_gui uploaded
L18L,
This operation:

# iconv -f UTF-8 -t ASCII//TRANSLIT < xdelta_gui > xdelta_guiOUT

deletes that invalid multi-byte character on line 23. The apostrophes get converted to '?'. I restored the '&#8595;' multi-byte character.

I have fixed the script and added it to Woof. It is uploaded, see timeline commit '54912365f0' 2011-12-12:

http://bkhome.org/fossil/woof2.cgi/timeline

I have not yet added any translations.

Note, the apostrophes that you put into xdelta_gui are actually not a problem. The only problem was the invalid multi-byte character on line 23.

Having multi-byte characters in a text string that for example gtkdialog will display, then if gtkdialog can handle that it is okay.

However, 'echo' strings would probably have to stay as ascii only, as I have not yet configured Busybox as multi-byte aware (I did experiment with that awhile back but rolled back).

Generally, for English strings in scripts, it is probably best to keep them ascii only unless there is a particular need not to.

Note, there is one character in xdelta_gui that is still remaining as multi-byte -- '&#8595;' -- see line 154. This string is handled by gtkdialog.



Posted on 13 Dec 2011, 8:11 by zygo
SciTE shows the 3 chars
SciTE 2.01 (in q130) shows the 3 chars on line 23.


Posted on 13 Dec 2011, 20:54 by L18L
'&#8595;
Note, there is one character in xdelta_gui that is still remaining as multi-byte -- '&#8595;' -- see line 154. This string is handled by gtkdialog.

Yes, it is the original "arrow down".
#!/bin/ash
echo &#8595;

urxvt:
# ./arrow_test
&#8595;
#

Is working in real console too
I am going to test it in initrd (busybox) now.

PS
using iconv has helped me to convert output of automated translation to UTF-8.

No problem with xdelta-gui fixed. It works for me



Posted on 13 Dec 2011, 22:03 by L18L
&#8595; again
I am going to test it in initrd (busybox) now.
http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27 target=_blank>
http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27



Posted on 13 Dec 2011, 22:06 by L18L
&#8595; one more time
Sorry, delete above link please and take this one:
http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27


Posted on 14 Dec 2011, 6:14 by BarryK
Link
http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=27



Posted on 14 Dec 2011, 6:30 by BarryK
echo multi-byte
L18L,
Busybox in the initrd is not multi-byte aware:

# CONFIG_LOCALE_SUPPORT is not set

# CONFIG_UNICODE_SUPPORT is not set
# CONFIG_UNICODE_USING_LOCALE is not set
# CONFIG_FEATURE_CHECK_UNICODE_IN_ENV is not set
CONFIG_SUBST_WCHAR=0
CONFIG_LAST_SUPPORTED_WCHAR=0
# CONFIG_UNICODE_COMBINING_WCHARS is not set
# CONFIG_UNICODE_WIDE_WCHARS is not set
# CONFIG_UNICODE_BIDI_SUPPORT is not set
# CONFIG_UNICODE_NEUTRAL_TABLE is not set
# CONFIG_UNICODE_PRESERVE_BROKEN is not set


I have just experimented with the echo in that busybox, does not display the down-arrow correctly.

So, I don't know how you are getting it to display, unless you are using a different initrd busybox than the default one in Woof, that is unicode-enabled.



Posted on 14 Dec 2011, 18:41 by L18L
re: echo multi-byte
One of my changes in init to minimize the "lopside" of i18n
http://www.murga-linux.com/puppy/viewtopic.php?t=72321&start=25

# i18n
zcat /lib/consolefonts/LatGrkCyr-8x16.psfu.gz | loadfont # All European languages; new default ?!



Posted on 16 Dec 2011, 18:21 by L18L
t12s
Posting this from a neighbour now because
I am without internet connection at home.

Making progress on testing automated translations without internet.

Hope to get back online sooner or later.


Posted on 20 Jan 2012, 8:53 by K Godt
 SIGN

Just came accidentally across and want to say that i encountered that SIGN too

It seems to be a single 'space' .

ffmpeg had been complaining about it in a script and i have replaced spaces with TABs in the script .

like

ffmpeg\
[TAB]-option\
[TAB]-i infile\
[TAB]/outfile



Posted on 4 Feb 2012, 8:07 by zygo
Could it be a BOM?
M$ strikes again
http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark



Posted on 4 Feb 2012, 8:02 by BarryK
Re BOM
Zygo,
The is very educational. Thus, the presence of  in L18L's scripts means that he has been editing them in a MS Windows text editor! Aaaargh!



Posted on 4 Feb 2012, 15:31 by zygo
corrupt
Worse still the file is corrupt!

I test webpages on ie and amend them too then in Puppy I edit with Scite which would show this UTF-8 BOM. But it isn't in my files. Probably because the files have only 7-bit chars.