Re: From Simple to Complex

Поиск

Список

Период

Сортировка

От	Alessandro Gagliardi
Тема	Re: From Simple to Complex
Дата	1 февраля 2012 г. 17:19:44
Msg-id	CAAB3BBJuFOEX77bzxNTpxTzVmyJ5cUssVKbo9o=diK1wz11FRQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: From Simple to Complex (Alessandro Gagliardi <alessandro@path.com>)
Ответы	Re: From Simple to Complex Re: From Simple to Complex
Список	pgsql-performance

Дерево обсуждения

Final update on this thread: since it is only necessary for me to get a rough ratio of the distribution (and not the absolute count), I refactored the query to include a subquery that samples from the moments table thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage of another table called blocks that happens to contain the moment_type as well (thus making it so I don't need to reference pg_class). The final query looks like:

SELECT moment_type, emotion, COUNT(feedback_id)

FROM (SELECT moment_id, block_id

FROM moments

WHERE inserted BETWEEN 'yesterday' AND 'today'

ORDER BY RANDOM() LIMIT 10000) AS sample_moments

JOIN blocks USING (block_id)

JOIN emotions USING (moment_id)

GROUP BY moment_type, emotion

ORDER BY moment_type, emotion

The explain is at http://explain.depesz.com/s/lYh

Interestingly, increasing the limit does not seem to increase the runtime in a linear fashion. When I run it with a limit of 60000 I get a runtime of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744 ms. I assume that that's because I'm hitting a memory limit and paging out. Is that right?

On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi <alessandro@path.com> wrote:

I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S

From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:

SELECT relname, emotion, COUNT(feedback_id)
FROM pg_class, moments, emotions
WHERE moments.tableoid = pg_class.oid
AND emotions.inserted > 'yesterday'
AND moments.inserted BETWEEN 'yesterday' AND 'today'
AND emotions.moment_id = moments.moment_id
GROUP BY relname, emotion
ORDER BY relname, emotion;

That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF

On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

В списке pgsql-performance по дате отправления:

Предыдущее

От: Merlin Moncure
Дата: 01 февраля 2012 г., 16:52:29
Сообщение: Re: Index with all necessary columns - Postgres vs MSSQL

Следующее

От: Scott Marlowe
Дата: 01 февраля 2012 г., 17:32:46
Сообщение: Re: Index with all necessary columns - Postgres vs MSSQL

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: From Simple to Complex

Предыдущее

Следующее