Re: queries with lots of UNIONed relations

Поиск
Список
Период
Сортировка
От Jon Nelson
Тема Re: queries with lots of UNIONed relations
Дата
Msg-id AANLkTikM9=JXPwYpAbfQ5KZgenWBCn5_J2=W6MRhq0U0@mail.gmail.com
обсуждение исходный текст
Ответ на Re: queries with lots of UNIONed relations  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: queries with lots of UNIONed relations  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> In the former case, the query plan was a bitmap heap scan for each
>> table. Then those results were Appended, Sorted, Uniqued, Sorted
>> again, and then returned.
>
>> In the latter, before Appending, each table's results were run through
>> HashAggregate.
>
> Probably the reason it did that is that each individual de-duplication
> looked like it would fit in work_mem, but a single de-duplication
> didn't.  Consider raising work_mem, at least for this one query.

I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB',
within the transaction).  Nice. Instead of 7-10 minutes the result is
now about a minute (the same as with individual de-duplication).

Your comment regarding "each individual de-duplication looked like it
would fit in work_mem" doesn't really make sense, exactly. Maybe I'm
misunderstanding you.

What I'm asking is this: can postgresql apply a de-duplication to each
member of a UNION (as I did with SELECT DISTINCT) in order to reduce
the total number of rows that need to be de-duplicated when all of the
rows have been Appended?

The results of the various plans/tweaks are:

Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes):
Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) )

and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute):
HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) )

and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*):
Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) )

So I take from this the following:

1. if the result set fits in work_mem, hash aggregate is wicked fast.
About 1 jillion times faster than Unique+Sort.

2. it would be nifty if postgresql could be taught that, in a UNION,
to de-duplicate each contributory relation so as to reduce the total
set of rows that need to be re-de-duplicated. It's extra work, true,
and maybe there are some tricks here, but it seems to make a big
difference. This is useful so that the total result set is small
enough that hash aggregate might apply.

NOTE:

I have to have work_mem really low as a global on this machine because
other queries involving the same tables (such as those that involve
UNION ALL for SUM() or GROUP BY operations) cause the machine to run
out of memory. Indeed, even with work_mem at 1MB I run the machine out
of memory if I don't explicitly disable hashagg for some queries. Can
anything be done about that?


--
Jon

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: queries with lots of UNIONed relations
Следующее
От: Tom Lane
Дата:
Сообщение: Re: queries with lots of UNIONed relations