Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CA+TgmoZGFt6BAxW9fYOn82VAf1u=V0ZZx3bXMs79phjg_9NYjQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On Sat, Nov 28, 2015 at 7:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> For me very large sorts (100,000,000 ints) with work_mem below 4MB do
>> better with unpatched than with your patch series, by about 5%.  Not a
>> big deal, but also if it is easy to keep the old behavior then I think
>> we should.  Yes, it is dumb to do large sorts with work_mem below 4MB,
>> but if you have canned apps which do a mixture of workloads it is not
>> so easy to micromanage their work_mem.  Especially as there are no
>> easy tools that let me as the DBA say "if you connect from this IP
>> address, you get this work_mem".
>
> I'm not very concerned about a regression that is only seen when
> work_mem is set below the (very conservative) postgresql.conf default
> value of 4MB when sorting 100 million integers.

Perhaps surprisingly, I tend to agree.  I'm cautious of regressions
here, but large sorts in queries are relatively uncommon.  You're
certainly not going to want to return a 100 million tuples to the
client.  If you're trying to do a merge join with 100 million tuples,
well, 100 million integers @ 32 bytes per tuple is 3.2GB, and that's
the size of a tuple with a 4 byte integer and at most 4 bytes of other
data being carried along with it.  So in practice you'd probably need
to have at least 5-10GB of data, which means you are trying to sort
data over a million times larger than the amount of memory you allowed
for the sort.   With or without that patch, you should really consider
raising work_mem.  And maybe create some indexes so that the planner
doesn't choose a merge join any more.  The aggregate case is perhaps
with a little more thought: maybe you are sorting 100 million tuples
so that you can GroupAggregate them.  But, there again, the benefits
of raising work_mem are quite large with or without this patch.  Heck,
if you're lucky, a little more work_mem might switch you to a
HashAggregate.  I'm not sure it's worth complicating the code to cater
to those cases.

While large sorts are uncommon in queries, they are much more common
in index builds.  Therefore, I think we ought to be worrying more
about regressions at 64MB than at 4MB, because we ship with
maintenance_work_mem = 64MB and a lot of people probably don't change
it before trying to build an index.  If we make those index builds go
faster, users will be happy.  If we make them go slower, users will be
sad.  So I think it's worth asking the question "are there any CREATE
INDEX commands that someone might type on a system on which they've
done no other configuration that will be slower with this patch"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Logical replication and multimaster
Следующее
От: Robert Haas
Дата:
Сообщение: Re: psql ignores failure to open -o target file