Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CA+Tgmob9ssG0XB4-GzPARCe0C8gXzHBCrAzNyout1oWj+9Nc9Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Using quicksort for every external sort run
Список pgsql-hackers
On Thu, Nov 19, 2015 at 3:43 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I'd be interested in seeing this analysis in some detail.
>
> Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a
> case where that's the work_mem setting, and see experimentally where
> the crossover point for a multi-pass sort ends up.
>
> If this table is created:
>
> postgres=# create unlogged table bar as select (random() * 1e9)::int4
> idx, 'payload xyz'::text payload from generate_series(1, 10100000) i;
> SELECT 10100000
>
> Then, on my system, a work_mem setting of 8MB *just about* avoids
> seeing the multipass_warning message with this query:
>
> postgres=# select count(distinct idx) from bar ;
>
>    count
> ------------
>  10,047,433
> (1 row)
>
> A work_mem setting of 235MB is just enough to make the query's sort
> fully internal.
>
> Let's see how things change with a higher work_mem setting of 16MB. I
> mentioned quadratic growth: Having doubled work_mem, let's *quadruple*
> the number of tuples, to see where this leaves a 16MB setting WRT a
> multi-pass merge:
>
> postgres=# drop table bar ;
> DROP TABLE
> postgres=# create unlogged table bar as select (random() * 1e9)::int4
> idx, 'payload xyz'::text payload from generate_series(1, 10100000 * 4)
> i;
> SELECT 40400000
>
> Further experiments show that this is the exact point at which the
> 16MB work_mem setting similarly narrowly avoids a multi-pass warning.
> This should be the dominant consideration, because now a fully
> internal sort requires 4X the work_mem of my original 16MB work_mem
> example table/query.
>
> The quadratic growth in a simple hybrid sort-merge strategy's ability
> to avoid a multi-pass merge phase (growth relative to linear increases
> in work_mem) can be demonstrated with simple experiments.

OK, so reversing this analysis, with the default work_mem of 4MB, we'd
need a multi-pass merge for more than 235MB/4 = 58MB of data.  That is
very, very far from being a can't-happen scenario, and I would not at
all think it would be acceptable to ignore such a case.  Even ignoring
the possibility that someone with work_mem = 8MB will try to sort
235MB of data strikes me as out of the question.  Those seem like
entirely reasonable things for users to do.  Greg's example of someone
with work_mem = 1GB trying to sort 4TB does not seem like a crazy
thing to me.  Yeah, in all of those cases you might think that users
should set work_mem higher, but that doesn't mean that they actually
do.  Most systems have to set work_mem very conservatively to make
sure they don't start swapping under heavily load.

I think you need to revisit your assumptions here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Error with index on unlogged table
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Using quicksort for every external sort run