Re: sgml cleanup: unescaped '>' characters

Поиск
Список
Период
Сортировка
От Josh Kupershmidt
Тема Re: sgml cleanup: unescaped '>' characters
Дата
Msg-id CAK3UJRF2saBe6EDZ90eKVZ316DrkW_USMs2Kp9jzaQuRygdUeg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: sgml cleanup: unescaped '>' characters  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: sgml cleanup: unescaped '>' characters  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-docs
On Sat, Aug 27, 2011 at 3:48 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On ons, 2011-08-24 at 23:28 -0400, Josh Kupershmidt wrote:
>> I found myself rewriting the ./src/tools/find_gt_lt script in Perl
>> this evening, since the existing script was quite broken (the main
>> problem is it's not capable of understanding CDATA or sgml comment
>> sections, and hence produces a bunch of noise).
>>
>> The rewritten version picked up a few stylistic inconsistencies in the
>> SGML, such as:
>>  * breaking the trailing '>' of an SGML marker across lines. AFAIK
>> this is legal, but is a bit inconsistent and just confuses simplistic
>> tools like find_gt_lt
>
> The cases you show don't appear to be terribly useful, but I think on
> occasion this can be necessary to work around some arcane whitespace
> rules in SGML or XML.  (Just look at the generated HTML; it uses this
> technique throughout.)

Hrm, well if the spurious whitespace isn't serving any purpose in
these cases, why not just fix it to match the rest of SGML style?

>>  * using single quotes instead of double quotes to surround a node
>> attribute, as in <orderedlist numeration='loweralpha'>
>
> It would be better if the tool could handle that, because sometimes you
> want to use single quotes if the value contains double quotes.

It's trivial to adjust the regex I was using to ignore such cases. I'm
just on about stylistic consistency here. If there's a reason to use
single quotes, such as when the value contains double quotes, then
that's fine -- but I don't think any of the cases I pointed out fall
under that category.

>> as well as seemingly-invalid SGML, such as using '>' unescaped inside
>> normal SGML entries.
>
> Unescaped > is valid, AFAIK.

Oh, that's interesting. I took a quick look at "The SGML FAQ book",
page 73 [1], which supports this claim.

But I notice we've been fixing such issues in the recent past (e.g.
commit d420ba2a2d4ea4831f89a3fd7ce86b05eff932ff). Don't we want to
continue doing so? Not to mention the fact that we have
./src/tools/find_gt_lt, which while somewhat broken, has the
ostensible goal of finding such problems in the SGML. Or do we want to
stop worrying about '>' entirely, and rename find_gt_lt to find_lt,
instead?

Josh

[1] http://books.google.com/books?id=OyJHFJsnh10C&lpg=PA229&ots=DGkYDdvbhE&pg=PA73#v=onepage&q&f=false

В списке pgsql-docs по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: sgml cleanup: unescaped '>' characters
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: sgml cleanup: unescaped '>' characters