Обсуждение: Any advice tuning this query ?

Поиск
Список
Период
Сортировка

Any advice tuning this query ?

От
Henrik Ekenberg
Дата:

Hi,

I have a select moving around a lot of data and takes times
Any advice tuning this query ?

EXPLAIN (ANALYZE ON, BUFFERS ON)
    select
    d.books,
    d.date publish_date,
    extract(dow from d.date) publish_dow,
    week_num_fixed,
    coalesce(sum(case when i.invno is not null then 1 else 0 end),0) as daily_cnt,
    coalesce(sum(i.activation_amount_sek),0) as daily_amt_sek
    from dates_per_books d
    left join publishing_data i on (d.books=i.books and d.date=i.publish_date)
    group by 1,2,3,4;

( explain : https://explain.depesz.com/s/aDOi )
    
                                                                                            QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=44606264.52..48172260.66 rows=4318263 width=68) (actual time=839980.887..1029679.771 rows=43182733 loops=1)
   Group Key: d.books, d.date, (date_part('dow'::text, (d.date)::timestamp without time zone)), d.week_num_fixed
   Buffers: shared hit=3, local hit=10153260 read=165591641, temp read=2097960 written=2097960
   I/O Timings: read=399828.103
   ->  Sort  (cost=44606264.52..45104896.89 rows=199452945 width=48) (actual time=839980.840..933883.311 rows=283894005 loops=1)
         Sort Key: d.books, d.date, (date_part('dow'::text, (d.date)::timestamp without time zone)), d.week_num_fixed
         Sort Method: external merge  Disk: 16782928kB
         Buffers: shared hit=3, local hit=10153260 read=165591641, temp read=2097960 written=2097960
         I/O Timings: read=399828.103
         ->  Merge Left Join  (cost=191.15..13428896.40 rows=199452945 width=48) (actual time=0.031..734937.112 rows=283894005 loops=1)
               Merge Cond: ((d.books = i.books) AND (d.date = i.publish_date))
               Buffers: local hit=10153260 read=165591641
               I/O Timings: read=399828.103
               ->  Index Scan using books_date on dates_per_books d  (cost=0.56..1177329.91 rows=43182628 width=20) (actual time=0.005..33789.216 rows=43182733 loops=1)
                     Buffers: local hit=10 read=475818
                     I/O Timings: read=27761.376
               ->  Index Scan using activations_books_date on publishing_data i  (cost=0.57..7797117.25 rows=249348384 width=32) (actual time=0.004..579806.706 rows=249348443 loops=1)
                     Buffers: local hit=10153250 read=165115823
                     I/O Timings: read=372066.727
 Planning time: 2.864 ms
 Execution time: 1034284.193 ms
(21 rows)

(END)

Re: Any advice tuning this query ?

От
Devrim Gündüz
Дата:
Hi,

On Fri, 2016-11-11 at 16:19 +0100, Henrik Ekenberg wrote:
>          Sort Method: external merge  Disk: 16782928kB

This query is generating 16GB temp file on disk. Is this the amount of data you
want to sort?

Regards,
--
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

Вложения

Re: Any advice tuning this query ?

От
Andreas Karlsson
Дата:
I have a couple of suggestions which should lead to some minor
improvements, but in general I am surprised by the huge size of the
result set. Is your goal really to get a 43 million row result? When a
query returns that many rows usually all possible query plans are more
or less bad.

1) You can remove "3" from the group by clause to avoid having to sort
that data when we already sort by d.date.

2) If (books, date) is the primary key of dates_per_books we can also
safely remove "4" from the group by clause further reducing the length
of the keys that we need to sort.

3) For a minor speed up change "coalesce(sum(case when i.invno is not
null then 1 else 0 end),0)" to "count(i.invno)".

Andreas


Re: Any advice tuning this query ?

От
Jeff Janes
Дата:
On Fri, Nov 11, 2016 at 7:19 AM, Henrik Ekenberg <henrik@ekenberg.pw> wrote:

Hi,

I have a select moving around a lot of data and takes times
Any advice tuning this query ?

EXPLAIN (ANALYZE ON, BUFFERS ON)


When accessing lots of data, sometimes the act of collecting timing on all of the actions makes the query take >2x times longer, or more, and distorts the timings it collects.

Try running the same query like:

EXPLAIN (ANALYZE ON, BUFFERS ON, timing off)

If the Execution times are very similar either way, then you don't have this problem.  But if they differ, then you can't depend on the results of the timings reported when timing is turned on.  Large sorts are particularly subject to this problem.

More than half the time (if the times are believable) goes to scanning the index activations_books_date.  You might be better off with a sort rather than an index scan.  You can test this by doing:

begin;
drop index activations_books_date;
<explain your query here>;
rollback;

Don't do that on production server, as it will block other access to the table for the duration.


You might also benefit from hash joins/aggregates, but you would have to set work_mem to a very large value get them.  I'd start by setting work_mem in your session to 1TB, and seeing if that changes the explain plan (just explain, not explain analyze!).  If that supports the hash joins/aggregates, then keeping lowering work_mem until you find the minimum that supports the hash plans.  Then ponder if it is safe to use that much work_mem "for real" given your RAM and level  of concurrent access.

Cheers,

Jeff