Обсуждение: non-ASCII characters in SGML documentation (and elsewhere)
There are a few literal non-ASCII characters in the SGML documentation, namely in isn.sgml release-7.4.sgml release-8.4.sgml Also, there are some encoded (&foo;) non-ASCII characters in release-8.0.sgml release-8.1.sgml release-8.2.sgml unaccent.sgml These all work fine, because they are all LATIN1, and DocBook SGML uses LATIN1. But I notice that the contributor names in the 9.1 release notes have been carefully ASCII-fied, presumably from the Git UTF-8 commit messages. For additional amusement, when creating the HISTORY file, lynx recodes the HTML into the encoding specified by your LC_CTYPE environment setting. Also, the following source files contain non-ASCII characters in comments: src/backend/port/dynloader/darwin.c (LATIN1) src/backend/storage/lmgr/predicate.c (UTF8) src/backend/storage/lmgr/README-SSI (UTF8) The last two are new in 9.1. So, some questions: * Should we consistently use entities for encoding non-ASCII characters in SGML? Or use LATIN1 freely? * Should we allow/use non-ASCII characters in the release notes? * What encoding should the HISTORY file have? * Should we allow non-ASCII characters in general source files? * If so, what should the encoding be?
Hello Peter, On 19.05.2011 23:49, Peter Eisentraut wrote: > So, some questions: > > * Should we consistently use entities for encoding non-ASCII > characters in SGML? Or use LATIN1 freely? > * Should we allow/use non-ASCII characters in the release notes? > * What encoding should the HISTORY file have? > * Should we allow non-ASCII characters in general source files? > * If so, what should the encoding be? one more argument for switching to XML? :) I guess we will get some more non-ASCII signs in documentation. How do you want to document the collation stuff? Collations are for all that isn't ASCII. Our docs usually have small examples. I can imagine that you want to place German or Russian letters or whatever else as examples into doc. Do you have another idea then using utf8? What do you expect what not would fit into utf8? I would expect words like déjà vu - means words that English just copied from French and still use the French accents. Or even personal names with e.g. umlauts, accents, and other special signs from special languages. Also consider - usually editors (vi, emacs) use utf8 today. Btw. For German docs I use utf8. The HTML output works well with both 'ö' and 'ö'. I not yet tested other outputs. I just changed to utf8 in stylsheets and use export SP_ENCODING=XML before compiling. Unfortunately index sorting neither works with 'ö' nor 'ö' yet. We are still fighting with it and try to figure out how we can force that it will sort correct. Just changing makefile didn't help. But - in English docs - I doubt that you have to deal with indexes on special words using non-ASCII characters. Means very small and low effort changes already might help. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com
Peter Eisentraut <peter_e@gmx.net> writes: > * Should we consistently use entities for encoding non-ASCII > characters in SGML? Or use LATIN1 freely? I think we previously discussed this and agreed that all non-ASCII in the SGML docs should be written as entities. The existence of violations of that rule is just, well, a violation that ought to be fixed. > * Should we allow/use non-ASCII characters in the release notes? > * What encoding should the HISTORY file have? Ideally "sure, if entity-ified", but I don't know what to do about HISTORY. > * Should we allow non-ASCII characters in general source files? Prefer "no" here. regards, tom lane
Excerpts from Tom Lane's message of vie may 20 07:56:58 -0400 2011: > Peter Eisentraut <peter_e@gmx.net> writes: > > * Should we consistently use entities for encoding non-ASCII > > characters in SGML? Or use LATIN1 freely? > > I think we previously discussed this and agreed that all non-ASCII in > the SGML docs should be written as entities. The existence of > violations of that rule is just, well, a violation that ought to be > fixed. +1 > > * Should we allow/use non-ASCII characters in the release notes? > > * What encoding should the HISTORY file have? > > Ideally "sure, if entity-ified", but I don't know what to do about > HISTORY. Can we recode that to plain ascii? I think iconv has a //TRANSLIT flag or something like that. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 20.05.2011 13:56, Tom Lane wrote: >> * Should we allow non-ASCII characters in general source files? > Prefer "no" here. I only see two reasons for non-ASCII signs in English. Either it is a foreign name of e.g. a person or it is a word that English took from French like in déjà vu. For the second I am sure you will find synonyms that are ASCII only. The only other reason that I can see for non-ASCII signs in our docs is for demonstrating collations. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com
Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011: > On 20.05.2011 13:56, Tom Lane wrote: > >> * Should we allow non-ASCII characters in general source files? > > Prefer "no" here. > > I only see two reasons for non-ASCII signs in English. > Either it is a foreign name of e.g. a person > or it is a word that English took from French like in déjà vu. I'd like my name accented in the release notes, thanks. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On fre, 2011-05-20 at 07:56 -0400, Tom Lane wrote: > > * Should we allow non-ASCII characters in general source > files? > > Prefer "no" here. Going through this I felt a little bad butchering up people's names that hadn't bothered anyone before now. So as a compromise, I made contributor names UTF-8 consistently, but removed other uses of non-ASCII characters.
On fre, 2011-05-20 at 08:16 -0400, Alvaro Herrera wrote: > > > * Should we allow/use non-ASCII characters in the release > notes? > > > * What encoding should the HISTORY file have? > > > > Ideally "sure, if entity-ified", but I don't know what to do about > > HISTORY. > > Can we recode that to plain ascii? I think iconv has a //TRANSLIT > flag or something like that. To make this work on FreeBSD, where we build the releases, we need to use the following command: "/usr/bin/perl" -p -e 's/<H(1|2)$/<H\1 align=center/g' HISTORY.html | LC_ALL=en_US.ISO8859-1 lynx -force_html -dump -nolist-stdin | iconv -f latin1 -t us-ascii//TRANSLIT > HISTORY This also works on Linux/glibc, but FreeBSD is a bit stricter/more limited. Not sure about other platforms, but I'd guess if they don't have the required locales, they'd be no worse off than now anyway. The results are reasonable. It actually depends on the platform what //TRANSLIT does, e.g. on FreeBSD ö -> "o, on Linux ö -> o.
Alvaro Herrera wrote: > Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011: > > On 20.05.2011 13:56, Tom Lane wrote: > > >> * Should we allow non-ASCII characters in general source files? > > > Prefer "no" here. > > > > I only see two reasons for non-ASCII signs in English. > > Either it is a foreign name of e.g. a person > > or it is a word that English took from French like in déjà vu. > > I'd like my name accented in the release notes, thanks. Sure, you want the first "A" in Alvaro with an accent. I would love to backpatch that but it would be royal pain. I am afraid it can only easily be done in future release notes. I have added the proper markup to our release note checklist; patch attached. Does anyone else want special handling for their name? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml new file mode 100644 index 15f273c..c860b90 *** a/doc/src/sgml/release.sgml --- b/doc/src/sgml/release.sgml *************** non-ASCII characters convert *** 27,32 **** --- 27,34 ---- does not support it http://www.pemberley.com/janeinfo/latin1.html#latexta + Alvaro Herrera is Álvaro Herrera + wrap long lines For new features, add links to the documentation sections. Use </link>
Excerpts from Bruce Momjian's message of mié oct 12 18:21:19 -0300 2011: > Alvaro Herrera wrote: > > Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011: > > > On 20.05.2011 13:56, Tom Lane wrote: > > > >> * Should we allow non-ASCII characters in general source files? > > > > Prefer "no" here. > > > > > > I only see two reasons for non-ASCII signs in English. > > > Either it is a foreign name of e.g. a person > > > or it is a word that English took from French like in dj vu. > > > > I'd like my name accented in the release notes, thanks. > > Sure, you want the first "A" in Alvaro with an accent. I would love to > backpatch that but it would be royal pain. I am afraid it can only > easily be done in future release notes. Many thanks, Bruce. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support