Encoding: a proposal

George Pauliuc pauliuc@gmx.net
25 Nov 2002 01:23:10 +0200


--=-rI36vEdMpZRBXuhsfCFm
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Du, 2002-11-24 at 21:12, Lorenzo Cappelletti wrote:
> Can you tell me how I can know for sure what encoding one is using on
> their *nix system?

AFAIK you can't.  So probably there have to be different wml files for
each translation.  First of all, we don't know if all translators will
be using Unix.  Thus a script that will create the tree for a translator
is impossible to handle all configurations.

> > Hmm... what do you have in mind Lorenzo?
> Maybe you... who knwos?!?

In this case I'll ask on the way.  Never done anything like that.


> There they use one subdirectory for each language.  A translator has to
> open two files, one with the original version, and one with the
> translated one.  What's worst, translator (or site maintainer) has to
> take care of updating, for each language, also HTML look, too, not
> just contents.

The work is too much.  And, prone to mistakes on both sides - admin and
translator.  IMHO not a very good idea after all.

> Here translator has their own translation just below the orginal
> sentence being translated, just like .po files.  Common parts are really
> common to all languages and not repeated throghout each language
> subdirs.  Unfortunately, I didn't know that encoding might be such an
> hassle.

Keeping all in one file.  Hmm... only UTF-8 might solve the problem.=20
And remeber, a simple mistake like saving as a different encoding might
ruin the work.

> Anyway, going back to .wml files... You should think to have something
> like this:
>=20
>  # file.wml
>  <ul>
>    <li>
>      (en)this is in english
>      (ru)text in romanian encoding: #@^#@...

One note: ru is for Russian.  I believe germans have Rumanien.  It's the
only language I know that spells it like that.  Everybody else has
Romania.  Also, the international codes for Romania and Romanian are
either 'ro' or 'rom'.

> In other words, common parts are just copied from .wml file to .html.xx
> files, while translation are copied only in to the file they belong to.
> No actions are taken on translations.
>=20
> That's why I believe a translator can choose the enconding that they
> like best and see their multi-byte text simply copied from .wml files to
> .html.xx files.

Shouldn't they be .LL.html instead of .html.LL?  On Windows-like systems
that will cause confusion as the system will conside the language code
the actual extension.

--=-rI36vEdMpZRBXuhsfCFm
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: 'See search.keyserver.net for my signing key'

iD8DBQA94V9eEM28XWGBdX8RAsAwAJ0eMI294fVAXn0jrDGzwYR32F0JNwCcDtjl
LINzPNY5TqNQsa8Ql1PF5og=
=U+3K
-----END PGP SIGNATURE-----

--=-rI36vEdMpZRBXuhsfCFm--