Re: Slow hash join performance with many batches

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Slow hash join performance with many batches
Дата
Msg-id 28009.1433181492@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Slow hash join performance with many batches  (Alex Adriaanse <alex@oseberg.io>)
Список pgsql-performance
Alex Adriaanse <alex@oseberg.io> writes:
> I have several databases that have the same schema but different amounts of data in it (let's categorize these as
Small,Medium, and Large). We have a mammoth query with 13 CTEs that are LEFT JOINed against a main table. This query
takes<30 mins on the Small database, <2 hours to run on Large, but on the Medium database it takes in the vicinity of
14hours. 
> Running truss/strace on the backend process running this query on the Medium database reveals that for a big chunk of
thistime Postgres creates/reads/unlinks a very large quantity (millions?) of tiny files inside pgsql_tmp. I also ran an
EXPLAINANALYZE and am attaching the most time-consuming parts of the plan (with names redacted). Although I'm not too
familiarwith the internals of Postgres' Hash implementation, it seems that having over 4 million hash batches could be
what'scausing the problem. 

> I'm running PostgreSQL 9.3.5, and have work_mem set to 32MB.

I'd try using a significantly larger work_mem setting for this query,
so as to have fewer hash batches and more buckets per batch.

It might be unwise to raise your global work_mem setting, but perhaps
you could just do a "SET work_mem" within the session running the query.

Also, it looks like the planner is drastically overestimating the sizes
of the CTE outputs, which is contributing to selecting unreasonably large
numbers of batches.  If you could get those numbers closer to reality it'd
likely help.  Hard to opine further since no details about the CTEs were
provided.

            regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: Alex Adriaanse
Дата:
Сообщение: Slow hash join performance with many batches
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: Fastest way / best practice to calculate "next birthdays"