Обсуждение: UTF-8 docs?

Поиск
Список
Период
Сортировка

UTF-8 docs?

От
Tatsuo Ishii
Дата:
Just out of curiopusity, I wonder why we can't make the encoding of
SGML docs to be UTF-8, rather than current ISO-8859-1.

As long as everything is written in ASCII, the size of docs will be
almost same even if UTF-8 is used. Plus, if the encoding is changed to
UTF-8, it is very easy to translate the doc to local languages. As far
as I know, al of local language docs under
https://www.postgresql.org/docs/ are using UTF-8.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs?

От
Victor Wagner
Дата:
On Mon, 22 Aug 2016 14:16:45 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> Just out of curiopusity, I wonder why we can't make the encoding of
> SGML docs to be UTF-8, rather than current ISO-8859-1.


What a reason of "make the encoding of sgml docs" to be something?
What actual change should be made and what problems it would solve?

There are various translations of postgreSQL docs, and they use various
encodings. Translated versions of docs on http://postgresql.org/docs
are just links to external sites where translations are maintained. 

English documentation uses ISO-8859-1 (actually ASCII),
Russian uses UTF-8 (you can download our source tarball from
http://repo.postgrespro.ru/pgpro-9.5/src and see postgres source
distribution with UTF-8 sgmls inside). 

Japanese documentation in HTML form is served from
http://www.postgresql.jp/document/9.5/html/
in utf-8 too.

I.e. everybody who need utf-8 to represent translation of
documentation, already uses it.

What exatly you proposes do be done?

Really, what change we need, it is conversion from SGML to XML format. 
It would solve some real problems, such as ability to include diagrams
in the docs, and also let everyone to explicitely specify encoding in
XML declaration (and probably cause switch to UTF-8 as side effect,
because most XML-based tools use UTF-8 as default).



Re: UTF-8 docs?

От
Tatsuo Ishii
Дата:
> On Mon, 22 Aug 2016 14:16:45 +0900 (JST)
> Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> 
>> Just out of curiopusity, I wonder why we can't make the encoding of
>> SGML docs to be UTF-8, rather than current ISO-8859-1.
> 
> 
> What a reason of "make the encoding of sgml docs" to be something?
> What actual change should be made and what problems it would solve?

Problem is, the PostgreSQL docs is fixed to ISO-8859-1, and if I want
to use other encoding, I need to change the build system, which is
annoying. Ideally, if someone wants to use other than ISO-8859-1, then
he/she just change the contents of sgml files. Just changing the
ISO-8859-1 to UTF-8 will solve most problems.

(Probably allowing to specify arbitrary encoding is better but it
needs some work).

> There are various translations of postgreSQL docs, and they use various
> encodings. Translated versions of docs on http://postgresql.org/docs
> are just links to external sites where translations are maintained. 
> English documentation uses ISO-8859-1 (actually ASCII),
> Russian uses UTF-8 (you can download our source tarball from
> http://repo.postgrespro.ru/pgpro-9.5/src and see postgres source
> distribution with UTF-8 sgmls inside). 
> 
> Japanese documentation in HTML form is served from
> http://www.postgresql.jp/document/9.5/html/
> in utf-8 too.
> 
> I.e. everybody who need utf-8 to represent translation of
> documentation, already uses it.
> 
> What exatly you proposes do be done?

See above.

> Really, what change we need, it is conversion from SGML to XML format. 
> It would solve some real problems, such as ability to include diagrams
> in the docs, and also let everyone to explicitely specify encoding in
> XML declaration (and probably cause switch to UTF-8 as side effect,
> because most XML-based tools use UTF-8 as default).

That's another story.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs?

От
Peter Eisentraut
Дата:
On 8/22/16 1:16 AM, Tatsuo Ishii wrote:
> Just out of curiopusity, I wonder why we can't make the encoding of
> SGML docs to be UTF-8, rather than current ISO-8859-1.

Encoding handling in DocBook SGML is weird, and making it work robustly
will either fail or might be more work than just completing the
conversion to XML.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: UTF-8 docs?

От
Tatsuo Ishii
Дата:
> On 8/22/16 1:16 AM, Tatsuo Ishii wrote:
>> Just out of curiopusity, I wonder why we can't make the encoding of
>> SGML docs to be UTF-8, rather than current ISO-8859-1.
> 
> Encoding handling in DocBook SGML is weird, and making it work robustly
> will either fail or might be more work than just completing the
> conversion to XML.

I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs?

От
Peter Eisentraut
Дата:
On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
> I don't know what kind of problem you are seeing with encoding
> handling, but at least UTF-8 is working for Japanese, French and
> Russian.

Those translations are using DocBook XML.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: UTF-8 docs?

От
Tatsuo Ishii
Дата:
> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
>> I don't know what kind of problem you are seeing with encoding
>> handling, but at least UTF-8 is working for Japanese, French and
>> Russian.
> 
> Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c
/usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog-d stylesheet.dsl -t sgml -i output-html -i include-index
postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs?

От
Victor Wagner
Дата:
On Mon, 22 Aug 2016 10:53:25 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
> > I don't know what kind of problem you are seeing with encoding
> > handling, but at least UTF-8 is working for Japanese, French and
> > Russian.  
> 
> Those translations are using DocBook XML.
> 

Russian translation by Postgres Professional does use DocBook SGML,
although it uses xml as intermediate representation when applying
gettext to the documentation. I've already posted URL where sources of
postgresql with russian documentation in SGML format included can be
downloaded.