Re: sortsupport for text

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: sortsupport for text
Дата	14 июня 2012 г. 19:24:36
Msg-id	CAEYLb_WLGHT7yJLaRE9PPeRt5RKd5ZJbb15tE+kpgejgQKORyA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: sortsupport for text (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: sortsupport for text (Robert Haas <robertmhaas@gmail.com>) Re: sortsupport for text (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
> I thought that doubling repeatedly would be overly aggressive in terms
> of memory usage.  Blowing the buffers out to 8kB because we hit a
> string that's a bit over 4kB isn't so bad, but blowing them out to
> 256MB because we hit a string that's a bit over 128MB seems a bit
> excessive.

That's pretty much what all popular dynamic array implementations do,
from C++'s std::vector to Python's list (it's a misnomer). Having to
allocate 256MB for a buffer to contain a string a bit over 128MB may
seem excessive, until you later get a 250MB string. Even if doubling
is generally excessive, which I doubt, that's beside the point, which
is that expanding the array by some constant proportion results in
each insertion taking amortized constant time.

> Suppose we expand the buffer a
> kilobyte at a time from an initial size of 1kB all the way out to
> 256MB.  That's 256,000 palloc calls, so we must be sorting at least
> 256,000 datums, at least 128,000 of which are longer than 128MB.  I
> think the cost of calling memcpy() and strcoll() repeatedly on all
> those long datums - not to mention repeatedly detoasting them - is
> going to bludgeon the palloc overhead into complete insignificance.

I fail to understand how this sortsupport buffer fundamentally differs
from a generic dynamic array abstraction built to contain chars. That
being the case, I see no reason not to just do what everyone else does
when expanding dynamic arrays, and no reason why we shouldn't make
essentially the same time-space trade-off here as others do elsewhere.

Another concern is that it seems fairly pointless to have two buffers.
Wouldn't it be more sensible to have a single buffer that was
partitioned to make two logical, equally-sized buffers, given that in
general each buffer is expected to grow at exactly the same rate?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 14 июня 2012 г., 19:21:04
Сообщение: Re: Patch pg_is_in_backup()

Следующее

От: Robert Haas
Дата: 14 июня 2012 г., 19:32:43
Сообщение: Re: sortsupport for text

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: sortsupport for text

Предыдущее

Следующее