Re: Improving on MAX_CONVERSION_GROWTH

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Improving on MAX_CONVERSION_GROWTH
Дата
Msg-id 20159.1569612302@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Improving on MAX_CONVERSION_GROWTH  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Improving on MAX_CONVERSION_GROWTH  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Sep 27, 2019 at 11:40 AM Andres Freund <andres@anarazel.de> wrote:
>> Note that one of the additional reasons for the 1GB limit is that it
>> protects against int overflows. I'm somewhat unconvinced that that's a
>> sensible approach, but ...

> It's not crazy. People using 'int' rather casually just as they use
> 'palloc' rather casually, without necessarily thinking about what
> could go wrong at the edges. I don't have any beef with that as a
> general strategy; I just think we should be trying to do better in the
> cases where it negatively affects the user experience.

A small problem with doing anything very interesting here is that the
int-is-enough-for-a-string-length approach is baked into the wire
protocol (read the DataRow message format spec and weep).

We could probably bend the COPY protocol enough to support multi-gig row
values --- dropping the rule that the backend doesn't split rows across
CopyData messages wouldn't break too many clients, hopefully.  That would
at least dodge some problems in dump/restore scenarios.

In the meantime, I still think we should commit what I proposed in the
other thread (<974.1569356381@sss.pgh.pa.us>), or something close to it.
Andres' proposal would perhaps be an improvement on that, but I don't
think it'll be ready anytime soon; and for sure we wouldn't risk
back-patching it, while I think we could back-patch what I suggested.
In any case, that patch is small enough that dropping it would be no big
loss if a better solution comes along.

Also, as far as the immediate subject of this thread is concerned,
I'm inclined to get rid of MAX_CONVERSION_GROWTH in favor of using
the target encoding's max char length.  The one use (in printtup.c)
where we don't know the target encoding could use MAX_MULTIBYTE_CHAR_LEN
instead.  Being smarter than that could help in some cases (mostly,
conversion of ISO encodings to UTF8), but it's not that big a win.
(I did some checks and found that some ISO encodings could provide a
max growth of 2x, but many are max 3x, so 4x isn't that far out of
line.)  If Andres' ideas don't pan out we could come back and work
harder on this, but for now something simple and back-patchable
seems like a useful stopgap improvement.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Attempt to consolidate reading of XLOG page
Следующее
От: legrand legrand
Дата:
Сообщение: Re: Hooks for session start and end, take two