Re: Slow join over three tables

Поиск
Список
Период
Сортировка
От Tim van der Linden
Тема Re: Slow join over three tables
Дата
Msg-id 20160428153600.26ddc2e5a2fcaf7520a302e8@shisaa.jp
обсуждение исходный текст
Ответ на Re: Slow join over three tables  (Alban Hertroys <haramrae@gmail.com>)
Ответы Re: Slow join over three tables  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general
On Wed, 27 Apr 2016 13:48:06 +0200
Alban Hertroys <haramrae@gmail.com> wrote:

> In this case, you're using the values in adverse to filter relevant rid's for the FK join, so you might be better off
withthe inverse of above index: 
>     create index on report_adverses (adverse, rid);
>     create index on report_drugs (drug, rid);

Hmmmm, like I reported yesterday this achieved a huge performance boost.

However, two (new) things I like to touch on while further experimenting with this query:

1. Altering Drug IDs or Adverse names effects the executing time negatively.

In this example altering the drug IDs I search for makes the query 6 times slower again:

SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
FROM reports r
JOIN report_drugs d ON d.rid = r.id
JOIN report_adverses a ON a.rid = r.id
WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain'])
AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created;

Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same. Query plan:

 Sort  (cost=31409.71..31417.48 rows=3107 width=41) (actual time=662.707..662.819 rows=4076 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 415kB
   ->  Nested Loop  (cost=25693.17..31229.48 rows=3107 width=41) (actual time=71.748..661.743 rows=4076 loops=1)
         ->  Merge Join  (cost=25692.61..26795.53 rows=3107 width=29) (actual time=70.841..97.094 rows=4076 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=16332.08..16671.61 rows=135810 width=8) (actual time=48.946..58.623 rows=135370 loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 12498kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs d  (cost=0.44..4753.44 rows=135810
width=8)(actual time=0.681..28.441 rows=135532 loops=1) 
                           Index Cond: (drug = ANY ('{9557,17848,17880,18223}'::integer[]))
               ->  Sort  (cost=9360.53..9556.94 rows=78565 width=21) (actual time=21.880..25.969 rows=77163 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6682kB
                     ->  Index Only Scan using report_adverses_adverse_rid_idx on report_adverses a
(cost=0.56..2972.57rows=78565 width=21) (actual time=0.983..10.744 rows=76972 loops=1) 
                           Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back
pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports r  (cost=0.56..1.42 rows=1 width=20)
(actualtime=0.137..0.138 rows=1 loops=4076) 
               Index Cond: (id = d.rid)
               Heap Fetches: 0
 Planning time: 16.438 ms
 Execution time: 663.583 ms

A total of 660 ms instead of the previous 120 ms. The amount of rows returned are 4076, instead of the 448 rows of the
previousquery. Could this be the cause of the slow down? Or is it the fact that the integers in the ANY clause are
furtherapart from one another and thus make for longer index searches? 

The same happens with changing the adverses, some adverses come back in ~120 ms, others can take up to one second.

2. Adding extra columns ignores indexes

This one was kind of expected. Adding extra columns from both "report_drugs" and "report_adverses" slows the lot down
again.Extra columns added: 

- a.recovery
- a.severity
- d.reason
- d.effectiveness
- d.duration

As I would expect, adding these columns would make the previous multi-column indexes useless. And they indeed were not
usedanymore. 

So my initial thought was to create new indexes on the columns I now query:

- CREATE INDEX ON report_adverses(adverse, rid, severity, recovery);
- CREATE INDEX ON report_drugs(drug, rid, reason, effectiveness, duration);

After running the query again, the new index on "report_adverses" got picked up, but the index on "report_drugs" did
not:

 Sort  (cost=12365.79..12366.61 rows=329 width=76) (actual time=129.106..129.120 rows=448 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 66kB
   ->  Nested Loop  (cost=11212.93..12352.04 rows=329 width=76) (actual time=31.558..128.951 rows=448 loops=1)
         Join Filter: (d.rid = r.id)
         ->  Merge Join  (cost=11212.38..11680.44 rows=329 width=64) (actual time=30.705..39.527 rows=448 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=1503.85..1539.82 rows=14390 width=35) (actual time=6.977..7.993 rows=14074 loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 1202kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs d  (cost=0.44..510.02 rows=14390
width=35)(actual time=0.567..4.638 rows=14200 loops=1) 
                           Index Cond: (drug = ANY ('{359,360,361,362,363}'::integer[]))
               ->  Sort  (cost=9708.53..9904.94 rows=78565 width=29) (actual time=23.717..26.540 rows=76974 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6702kB
                     ->  Index Only Scan using report_adverses_adverse_rid_severity_recovery_idx on report_adverses a
(cost=0.56..3320.57rows=78565 width=29) (actual time=0.878..12.297 rows=76972 loops=1) 
                           Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back
pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports r  (cost=0.56..2.03 rows=1 width=20)
(actualtime=0.198..0.199 rows=1 loops=448) 
               Index Cond: (id = a.rid)
               Heap Fetches: 0
 Planning time: 18.310 ms
 Execution time: 129.483 ms

Still fast enough ... but I was wondering why the planner would not use the new index and instead fall back on the
"report_drugs_drug_idx"single column index. 


> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.

Cheers,
Tim


В списке pgsql-general по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: BDR Alter table failing
Следующее
От: Yogesh Sharma
Дата:
Сообщение: Issue during postgresql startup