Encoding: a proposal

Lorenzo Cappelletti L.Cappelletti@mail.com
Sun, 24 Nov 2002 20:12:05 +0100


On Fri Nov 22 at 18:05 +0200, George Pauliuc wrote:

> really easy to screw in case somebody makes a mistake and tries to
> interpret it as single-byte encoding.  I know, it happened to me ;-)

Can you tell me how I can know for sure what encoding one is using on
their *nix system?


> Shouldn't be easier to make something like first step 'cp xx.wml
> xx.LL.wml'?  And to make sure - block the update of xx.wml from anybody

Unfortunately, not.  (Read below).


> > For those of you who needs images for proper language symbol rendering
> > (Romanian), I can provide some WML custom tags to make life easier.
> 
> Hmm... what do you have in mind Lorenzo?

Maybe you... who knwos?!?


> Could you describe in more detail the mechanism you plann to use?  I'm
> not sure I understand what will happen with the text.
> 
> I know it is silly, but, for beginning what is this .wml file?  I
> understand it is some formated text.  How?  From where can I get more
> details.

For designing this site I decided to use WML (http://www.thewml.org/), 
the Website META Language by Ralf Engelschall.  From his page:

 WML is a free and extensible Webdesigner's off-line HTML generation
 toolkit for Unix.  WML consists of a control frontend driving up to
 nine backends in a sequential pass-oriented filtering scheme. Each
 backend provides one particular core language. For maximum power WML
 additionally ships with a well-suited set of include files which
 provide higher-level features build on top of the backends core
 languages. While not trivial and idiot proof WML provides most of the
 core features real hackers always wanted for HTML generation.

It is used to build and display Debian site in so many languages...  But
I decided to used a different scheme inspired by .po files.

There they use one subdirectory for each language.  A translator has to
open two files, one with the original version, and one with the
translated one.  What's worst, translator (or site maintainer) has to
take care of updating, for each language, also HTML look, too, not
just contents.

Here translator has their own translation just below the orginal
sentence being translated, just like .po files.  Common parts are really
common to all languages and not repeated throghout each language
subdirs.  Unfortunately, I didn't know that encoding might be such an
hassle.


Anyway, going back to .wml files... You should think to have something
like this:

 # file.wml
 <ul>
   <li>
     (en)this is in english
     (ru)text in romanian encoding: #@^#@...
   </li>
 </li>

After a `wml file.wml', you'll get:

 # file.html.en
 <ul>
   <li>
     this is in english
   </li>
 </ul>

 # file.html.ru
 <ul>
   <li>
     text in romanian encoding: #@^#@...
   </li>
 </ul>

In other words, common parts are just copied from .wml file to .html.xx
files, while translation are copied only in to the file they belong to.
No actions are taken on translations.

That's why I believe a translator can choose the enconding that they
like best and see their multi-byte text simply copied from .wml files to
.html.xx files.


> > In the future we can think to adopt po4a mechanism
> > (http://savannah.nongnu.org/projects/po4a/).
> 
> The project looks quite in alpha stage.

Abso-bloody-lutely true.


-- 
email: L.Cappelletti@mail.com
Jabber: lolo@linux.it
Fingerprint: 8CDD 3408 53B2 6122 99DA EE37 1523 68FC D906 4C08

Vuoi aiutarci ad avere le descrizioni dei pacchetti Debian in italiano?
http://ddtp.debian.org/