Обсуждение: Any advice tuning this query ?
Hi,
I have a select moving around a lot of data and takes times
Any advice tuning this query ?
EXPLAIN (ANALYZE ON, BUFFERS ON)
select
d.books,
d.date publish_date,
extract(dow from d.date) publish_dow,
week_num_fixed,
coalesce(sum(case when i.invno is not null then 1 else 0 end),0) as daily_cnt,
coalesce(sum(i.activation_amount_sek),0) as daily_amt_sek
from dates_per_books d
left join publishing_data i on (d.books=i.books and d.date=i.publish_date)
group by 1,2,3,4;
( explain : https://explain.depesz.com/s/aDOi )
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=44606264.52..48172260.66 rows=4318263 width=68) (actual time=839980.887..1029679.771 rows=43182733 loops=1)
Group Key: d.books, d.date, (date_part('dow'::text, (d.date)::timestamp without time zone)), d.week_num_fixed
Buffers: shared hit=3, local hit=10153260 read=165591641, temp read=2097960 written=2097960
I/O Timings: read=399828.103
-> Sort (cost=44606264.52..45104896.89 rows=199452945 width=48) (actual time=839980.840..933883.311 rows=283894005 loops=1)
Sort Key: d.books, d.date, (date_part('dow'::text, (d.date)::timestamp without time zone)), d.week_num_fixed
Sort Method: external merge Disk: 16782928kB
Buffers: shared hit=3, local hit=10153260 read=165591641, temp read=2097960 written=2097960
I/O Timings: read=399828.103
-> Merge Left Join (cost=191.15..13428896.40 rows=199452945 width=48) (actual time=0.031..734937.112 rows=283894005 loops=1)
Merge Cond: ((d.books = i.books) AND (d.date = i.publish_date))
Buffers: local hit=10153260 read=165591641
I/O Timings: read=399828.103
-> Index Scan using books_date on dates_per_books d (cost=0.56..1177329.91 rows=43182628 width=20) (actual time=0.005..33789.216 rows=43182733 loops=1)
Buffers: local hit=10 read=475818
I/O Timings: read=27761.376
-> Index Scan using activations_books_date on publishing_data i (cost=0.57..7797117.25 rows=249348384 width=32) (actual time=0.004..579806.706 rows=249348443 loops=1)
Buffers: local hit=10153250 read=165115823
I/O Timings: read=372066.727
Planning time: 2.864 ms
Execution time: 1034284.193 ms
(21 rows)
(END)
Hi, On Fri, 2016-11-11 at 16:19 +0100, Henrik Ekenberg wrote: > Sort Method: external merge Disk: 16782928kB This query is generating 16GB temp file on disk. Is this the amount of data you want to sort? Regards, -- Devrim GÜNDÜZ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
Вложения
I have a couple of suggestions which should lead to some minor improvements, but in general I am surprised by the huge size of the result set. Is your goal really to get a 43 million row result? When a query returns that many rows usually all possible query plans are more or less bad. 1) You can remove "3" from the group by clause to avoid having to sort that data when we already sort by d.date. 2) If (books, date) is the primary key of dates_per_books we can also safely remove "4" from the group by clause further reducing the length of the keys that we need to sort. 3) For a minor speed up change "coalesce(sum(case when i.invno is not null then 1 else 0 end),0)" to "count(i.invno)". Andreas
Hi,
I have a select moving around a lot of data and takes times
Any advice tuning this query ?
EXPLAIN (ANALYZE ON, BUFFERS ON)
Try running the same query like:
EXPLAIN (ANALYZE ON, BUFFERS ON, timing off)
If the Execution times are very similar either way, then you don't have this problem. But if they differ, then you can't depend on the results of the timings reported when timing is turned on. Large sorts are particularly subject to this problem.
More than half the time (if the times are believable) goes to scanning the index activations_books_date. You might be better off with a sort rather than an index scan. You can test this by doing: