Re: Dead encoding conversion functions

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Dead encoding conversion functions
Дата
Msg-id 20190615180732.GA239428@rfd.leadboat.com
обсуждение исходный текст
Ответ на Dead encoding conversion functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree, and immediately tripped over an interesting factoid:
> the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
> untested ... not because the regression tests don't try, but because
> those conversions are unreachable.  pg_do_encoding_conversion() and
> its sister functions have hard-wired fast paths for any conversion
> in which the source or target encoding is SQL_ASCII, so that an
> encoding conversion function declared for such a case will never
> be used.

> This situation seems kinda silly.  My inclination is to delete
> these functions as useless, but I suppose another approach is
> to suppress the fast paths if there's a declared conversion function.
> (Doing so would likely require added catalog lookups in places we
> might not want them...)

Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII.  That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not.  So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII.  (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.)  Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue.  SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement.  However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().

> If we do delete them as useless, it might also be advisable to change
> CreateConversionCommand() to refuse creation of conversions to/from
> SQL_ASCII, to prevent future confusion.

Sounds good.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Extracting only the columns needed for a query
Следующее
От: Noah Misch
Дата:
Сообщение: Re: pg_upgrade can result in early wraparound on databases with hightransaction load