Re: Fast HashJoin only after a cluster/recreate table

Поиск
Список
Период
Сортировка
От Alexandre de Arruda Paes
Тема Re: Fast HashJoin only after a cluster/recreate table
Дата
Msg-id CAGewt-u4JVuibCPuRvPwXz8bcgi=SOagS1d=WKUxKdX1k_H5Cg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fast HashJoin only after a cluster/recreate table  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-performance
Hi David,

Thanks for the explanations. But I don't undestand why when I recreate the table, the planner choose a best mode for sometime...

>I wonder what the planner would do if you pulled out the join to ES08T. If that generates a better plan, then providing that es08tipdoc is the primary key of that table, then you could just put a 
> subquery in the SELECT clause to lookup the es08desdoc.

We have a problem with this approach. Actually, this querys are generated by a framework and can't be 'rewrite'. 
Can you think in another solution directly in DB (perhaps a partial index, table partitioning, etc) ???

Best regards,

Alexandre



2016-04-01 10:17 GMT-03:00 David Rowley <david.rowley@2ndquadrant.com>:
On 1 April 2016 at 15:44, Alexandre de Arruda Paes <adaldeia@gmail.com> wrote:
In the query below, the planner choose an extreme slow mergejoin(380 seconds). 'Vacuum analyze' can't help.

Yeah, it appears that planner is estimating the WHERE clause on es09t quite badly, expecting just 1 row, but there's actually 1212 rows.  This seems to throw the whole plan decision out quite a bit, as, if you notice in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2 rows to be present, therefore most likely thinks that it's not worth sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to match the known output order of Materialize node on the inner side of that join. Instead it assumes the Merge join will be good enough on just the es09tipdoc order, and just adds the other two columns as join filters....

... and this seems to be what's killing the performance. This Merge Join constantly has to perform a mark/restore on the Materialize node. This why you're getting the insanely high "Rows Removed by Join Filter: 992875295", in other words the join filter throw away that many row combinations because they didn't match.

This mark/restore is basically the rewind process that the merge join algorithm needs to do to match many to many rows. In actual fact, this rewind is pretty useless in this case as the GROUP BY subquery ensures that no duplicate values will make it into the inner side of that merge join. The planner is not currently smart enough to detect this.

There are some patches currently pending for 9.6 which might help fix this problem in the future;

1. multivariate statistics; this might help the poor estimates on es09t. If this was available you could add statistics on es09codemp, es09datreq, which may well improve the estimate and cause the plan to change. https://commitfest.postgresql.org/9/450/
2. Unique joins. This tackles the problem a different way and allows the Merge Join algorithm to skip the restore with some new smarts that are added to the planner to detect when the inner side of the join can only produce, at most, a single row for each outer row. https://commitfest.postgresql.org/9/129/

If you feel like compiling 9.6 devel from source and applying each of these patches independently and seeing if it helps...  Of course that does not solve your 9.4 production dilemma, but it may help evaluation of each of these two patches for 9.6 or beyond.

I wonder what the planner would do if you pulled out the join to ES08T. If that generates a better plan, then providing that es08tipdoc is the primary key of that table, then you could just put a subquery in the SELECT clause to lookup the es08desdoc.
 
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual time=380405.796..380405.929 rows=2408 loops=1)
   Sort Key: t1.es09numdoc, t1.es09tipdoc
   Sort Method: quicksort  Memory: 435kB
   Buffers: shared hit=82163
   ->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual time=1133.077..380398.160 rows=2408 loops=1)
         Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
         Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND (es09t1.es09numdoc = t1.es09numdoc))
         Rows Removed by Join Filter: 992875295
         Buffers: shared hit=82163
         ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual time=12.206..18.155 rows=2408 loops=1)
               Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
               Buffers: shared hit=6821
               ->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual time=11.611..12.248 rows=2408 loops=1)
                     Sort Key: t1.es09tipdoc
                     Sort Method: quicksort  Memory: 285kB
                     Buffers: shared hit=6814
                     ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44) (actual time=0.040..10.398 rows=2408 loops=1)
                           Buffers: shared hit=6814
                           ->  Index Scan using ad_es09t_1 on es09t t3  (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212 loops=1)
                                 Index Cond: ((es09codemp = 1) AND (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
                                 Filter: (es09usuari ~~ '%%%%%%%%%%%%%%%%%%%%'::text)
                                 Buffers: shared hit=108
                           ->  Index Scan using es09t1_pkey on es09t1 t1  (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2 loops=1212)
                                 Index Cond: ((es09codemp = 1) AND (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
                                 Filter: (es09tipdoc ~~ '%%%%%'::text)
                                 Buffers: shared hit=6706
               ->  Sort  (cost=37.35..38.71 rows=547 width=32) (actual time=0.592..2.206 rows=2919 loops=1)
                     Sort Key: t2.es08tipdoc
                     Sort Method: quicksort  Memory: 67kB
                     Buffers: shared hit=7
                     ->  Seq Scan on es08t t2  (cost=0.00..12.47 rows=547 width=32) (actual time=0.003..0.126 rows=547 loops=1)
                           Buffers: shared hit=7
         ->  Materialize  (cost=0.56..287644.85 rows=716126 width=23) (actual time=0.027..68577.800 rows=993087854 loops=1)
               Buffers: shared hit=75342
               ->  GroupAggregate  (cost=0.56..278693.28 rows=716126 width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
                     Group Key: es09t1.es09codemp, es09t1.es09tipdoc, es09t1.es09numdoc
                     Buffers: shared hit=75342
                     ->  Index Only Scan using es09t1_pkey on es09t1  (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031 rows=7160921 loops=1)
                           Index Cond: (es09codemp = 1)
                           Heap Fetches: 51499
                           Buffers: shared hit=75342
 Planning time: 50.129 ms
 Execution time: 380419.435 ms


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Fast HashJoin only after a cluster/recreate table
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Searching GIN-index (FTS) and sort by timestamp-column