Performance issue with NestedLoop query

Поиск

Список

Период

Сортировка

От	Ram N
Тема	Performance issue with NestedLoop query
Дата	30 июля 2015 г. 10:52:07
Msg-id	CACGZU34kJ5-e_vZCrcnf4BDY=XTmoyfkxWHwkLG=CgfC8m4LnA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Performance issue with NestedLoop query Re: Performance issue with NestedLoop query
Список	pgsql-performance

Дерево обсуждения

Hi,

I am trying to see if I can do anything to optimize the following plan.

I have two tables and I am doing a join between them. After joining it calculates aggregates (Sum and Count)

Table 1 : timestamp (one per day) for 2 years (730 records)

Table 2 : Window based validity records. Window here means start and end timestamp indicating a period of validity for a record.

Hash some 10 odd columns including start_time and end_time. (1 million records)

Machine has 244 GB RAM. Queries are taking more than a min and in some case 2-3 mins.

Below is the plan I am getting. The Nested loop blows up the number of records and we expect that. I have tried playing around work_mem and cache configs which hasn't helped.

Query

select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts > b.start_date and a.ts < b.end_date and a.ts > '2015-01-01 20:50:44.000000 +00:00:00' and a.ts < '2015-07-01 19:50:44.000000 +00:00:00' group by a.ts, st order by a.ts

Plan (EXPLAIN ANALYZE)

"Sort (cost=10005447874.54..10005447879.07 rows=1810 width=44) (actual time=178883.936..178884.159 rows=1355 loops=1)"

" Output: (sum(b.a)), (count(b.id)), a.ts, b.st"

" Sort Key: a.ts"

" Sort Method: quicksort Memory: 154kB"

" Buffers: shared hit=47068722 read=102781"

" I/O Timings: read=579.946"

" -> HashAggregate (cost=10005447758.51..10005447776.61 rows=1810 width=44) (actual time=178882.874..178883.320 rows=1355 loops=1)"

" Output: sum(b.a), count(b.id), a.ts, b.st"

" Group Key: a.ts, b.st"

" Buffers: shared hit=47068719 read=102781"

" I/O Timings: read=579.946"

" -> Nested Loop (cost=10000000000.43..10004821800.38 rows=62595813 width=44) (actual time=0.167..139484.854 rows=73112419 loops=1)"

" Output: a.ts, b.st, b.a, b.id"

" Buffers: shared hit=47068719 read=102781"

" I/O Timings: read=579.946"

" -> Seq Scan on public.table1 a (cost=0.00..14.81 rows=181 width=8) (actual time=0.058..0.563 rows=181 loops=1)"

" Output: a.ts"

" Filter: ((a.ts > '2015-01-01 20:50:44+00'::timestamp with time zone) AND (a.ts < '2015-07-01 19:50:44+00'::timestamp with time zone))"

" Rows Removed by Filter: 540"

" Buffers: shared read=4"

" I/O Timings: read=0.061"

" -> Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181)"

" Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date"

" Index Cond: (a.ts < b.end_date)"

" Filter: (a.ts > b.start_date)"

" Rows Removed by Filter: 392642"

" Buffers: shared hit=47068719 read=102777"

" I/O Timings: read=579.885"

"Planning time: 0.198 ms"

"Execution time: 178884.467 ms"

Any pointers on how to go about optimizing this?

--yr

В списке pgsql-performance по дате отправления:

Предыдущее

От: Alex Hunsaker
Дата: 30 июля 2015 г., 09:28:51
Сообщение: Re: Are many idle connections bad?

Следующее

От: Qingqing Zhou
Дата: 30 июля 2015 г., 23:24:59
Сообщение: Re: Performance issue with NestedLoop query

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Performance issue with NestedLoop query

Предыдущее

Следующее