Re: A better way than tweaking NTUP_PER_BUCKET

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: A better way than tweaking NTUP_PER_BUCKET
Дата	28 января 2014 г. 16:53:46
Msg-id	CAMkU=1y_qp+QUPGk=JBJSTtcYQpW2k=v2LMyTZkO_8ftuuy_fw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: A better way than tweaking NTUP_PER_BUCKET (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jan 27, 2014 at 10:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 27 January 2014 17:44, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> This topic is interesting - we found very bad performance with hashing large
> tables with high work_mem. MergeJoin with quicksort was significantly
> faster.

I've seen this also.

> I didn't deeper research - there is a possibility of virtualization
> overhead.

I took measurements and the effect was repeatable and happened for all
sizes of work_mem, but nothing more to add.

I get similar results if I join on integers. But joining on text, the hash wins by a mile.

I use this as a simple test bed:

alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey;

update pgbench_accounts set filler = md5(aid::text);

set work_mem to whatever keeps the join off of disk for the given scale;

set enable_hashjoin to whatever;

select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (aid);

select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (filler);

hash integer: 1832.695 ms

merge integer: 1462.913 ms

hash text: 2353.115 ms

merge text: 11,218.628 ms

The cost estimates do not depend on the column used in the join despite a 6 fold difference in run time, so the planner is perhaps missing a trick there.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: A better way than tweaking NTUP_PER_BUCKET