[mICQ] wrong codepage on www.micq.org for russian translation
Peter Stuge
stuge-micq at cdy.org
Wed Apr 18 03:36:10 CEST 2007
On Tue, Apr 17, 2007 at 11:37:36PM +0200, Rüdiger Kuhlmann wrote:
> >--[Peter Stuge]--<stuge-micq at cdy.org>
> >> [HTML man pages]
> > A better solution would perhaps be to convert the files to use UTF-8.
>
> The problem is that these files are US-ASCII.
The input files or the output files?
> The groff->HTML conversion is broken. Having a LC_ALL of ru_RU
> still causes "groff -man -Thtml" to still assume the input to be
> iso-8859-1.
groff itself does not care about encodings and uses fonts for all
translation. The default groff HTML device fonts have iso-8859-1
code points. Other groff fonts are needed for other encodings.
> One could as well say that groff is broken w.r.t.
> internationalization.
I don't know if it claims to support any i18n. In that case it can't
really be broken. :)
> Not to speak about -Tdvi at all...
>
> So if you have a practical solution to this problem, I'm ready to
> listen.
I don't know about dvi but after using the groff source I see that
the HTML postprocessor generates output as follows:
One character is one byte. Named characters are probably similar to
SGML entities. (I don't know groff.)
* groff will always use the special coding (&) for the character
if there is a special coding for it in the font.
* For named and unnamed characters that exist in the font but without
a special coding, just use the byte value.
* For unnamed characters not found in the device font, use the byte
value if below 0x80 or &#xx; otherwise.
* For named characters not found in the device font, use the byte
value (0x26).
So, set GROFF_FONT_PATH to a directory with groff device fonts using
the correct code points for each input file's encoding and the output
will not be mangled anymore.
One way is to make simple device fonts that have no characters, which
will produce HTML files full of &#xx;
Another is to make fonts that have all characters but no special
codings. Then groff will just pass everything through untouched.
These files could then be converted to utf-8.
Finally, put the character set in the HTML file, and voila. :)
(This is either the character set as used in the original groff
input, or utf-8 if there's been a conversion.)
<meta http-equiv="content-type" content="text/html; charset=per_lang">
//Peter
More information about the mICQ-List
mailing list