Query memory usage greatly in excess of work_mem * query plan steps

Поиск

Список

Период

Сортировка

От	Timothy Garnett
Тема	Query memory usage greatly in excess of work_mem * query plan steps
Дата	14 июня 2014 г. 05:10:50
Msg-id	CAPcyiQ1ZMjMCqCBrbbNRpNW9nbCsneAPJsyWwr+bxi25cGA0tg@mail.gmail.com обсуждение исходный текст
Ответы	Re: Query memory usage greatly in excess of work_mem * query plan steps Re: Query memory usage greatly in excess of work_mem * query plan steps
Список	pgsql-performance

Дерево обсуждения

Hi,

I have a query that's pulling data for another system using COPY (query) to STDOUT CSV on a 9.2.4 db (we're in the process of upgrading to 9.3). The final csv file is large (~75GB, 86 million rows). The query is also large, consisting of one table (86 million rows) left joined to a total of 30 other tables (of mixed size), 3 of which are CTE supplied by a WITH clause of and consist of 3 joins each for a total of 39 joins in the plan. work_mem on the system is set to 256MB.

We're running into problems with the machine running out of memory with this single query process consuming over 100GB resident memory before the machine exhausts swap and the Linux OOM handling eventually kills it. The query plan from explain comes to 186 rows, which assuming that each row requires the full work_mem (which should be a significant overestimate of the number operations and size) is < 50GB and we're observing substantially more then that. Is it reasonable to expect that a query will take ~ < work_mem * # of operations, or are there other factors in play?

The plan looks reasonable (though there are some odd right join uses, see below) and the row estimates look pretty accurate with the exception that one of the CTE queries is under-estimated row count wise by a little over 2 orders of magnitude (260k vs. 86 million rows). That query does a group by (plans as a sort then group aggregate, there are no hash aggregates in the plan which is something that might increase memory) and the group part miss-estimates the final number of rows for that CTE. Unlike the other CTEs when it's merged joined into the main query there's no materialize line in the plan (no idea if that's relevant).

As to the right join (used for a few of the joins, most are left join or merge):

-> Hash Right Join (cost=225541299.19..237399743.38 rows=86681834 width=1108)

Hash Cond: (xxx.xxx = yyy.yyy)

-> Seq Scan on xxx (cost=0.00..6188.18 rows=9941 width=20)

Filter: (mode = 'live'::text)

-> Hash (cost=212606744.27..212606744.27 rows=86681834 width=1096)

....

I'm not sure if I'm reading it right, but it looks like it's hashing the 86 million row set and scanning over the 10k row set which seems to me like the opposite of what you'd want to do, but I haven't seen a lot of hash right joins in plans and I'm not sure if that's how it works.

Tim

В списке pgsql-performance по дате отправления:

Предыдущее

От: pinker
Дата: 14 июня 2014 г., 05:10:48
Сообщение: Re: UNION and bad performance

Следующее

От: "Franklin, Dan"
Дата: 14 июня 2014 г., 05:19:36
Сообщение: Re: Query memory usage greatly in excess of work_mem * query plan steps

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Query memory usage greatly in excess of work_mem * query plan steps

Предыдущее

Следующее