Re: Slow join over three tables

Поиск

Список

Период

Сортировка

От	Tim van der Linden
Тема	Re: Slow join over three tables
Дата	29 апреля 2016 г. 00:19:22
Msg-id	20160429091911.178f306237be227d2350dad9@shisaa.jp обсуждение исходный текст
Ответ на	Re: Slow join over three tables (Alban Hertroys <haramrae@gmail.com>)
Список	pgsql-general

Дерево обсуждения

On Thu, 28 Apr 2016 16:38:53 +0200
Alban Hertroys <haramrae@gmail.com> wrote:

Hi Alban

First off ... thanks you for your time on this so far and giving me the educational smacks on the head :) I appreciate
thisa lot.  

> You're doing ~9 times as many index lookups. A slowdown of 6x of this
> part of the query seems rather reasonable.

Again, very true. But why 9 times as much?

> >  Planning time: 16.438 ms
> >  Execution time: 663.583 ms
> >
> > A total of 660 ms instead of the previous 120 ms. The amount of rows returned are 4076, instead of the 448 rows of
theprevious query. Could this be the cause of the slow down? Or is it the fact that the integers in the ANY clause are
furtherapart from one another and thus make for longer index searches? 
>
> Technically probably both, but the index node distance (assuming a
> btree index) should be insignificant.

It is indeed a b-tree index.

> The sorting of found candidate rows on rid probably contributes the
> most. I forgot how quicksort scales to the number of items to sort,
> probably something like 2log(n) or something like that.

Looked this up and you are right, a quicksort (in C) is of nlog(n) complexity ... and if I'm right this is "rather"
linear.Less sorting is thus the message :) 

> That said, you didn't happen to perform these tests (assuming >1)
> during a busier period of the database server? I ask because later on
> you seem to be getting fast results again after some more index
> changes and I don't see cause for the difference.

No, these test are on a local development machine with nothing else going on but the database cluster.

> > 2. Adding extra columns ignores indexes
> > ...
> > - a.recovery
> > - a.severity
> > - d.reason
> > - d.effectiveness
> > - d.duration
>
> I have to admit that I'm a bit behind with the current state of the
> art of PostgreSQL, but last time I checked, the database needed to
> look at each row in the result-set for transaction visibility
> information. I recall that there was (at some point) much discussion
> whether that visibility information could be added to indexes and that
> there were strong arguments against doing so. Hence, I doubt that
> those new index-only scans skip that step.

I see.

> Unless I'm wrong there, adding non-queried fields to the index is only
> going to affect your performance adversely. Relevant fields for
> indices are those used in joins, those regularly used in conditions
> (where-clauses) and those that are regularly sorted over.
>
> Other fields are available from the candidate rows of the result set.
> Putting them in the index just results in storing redundant
> information.

Storing redundant information and making for bloated indexes which the planner might choose to skip if I understand
correctly.Good to know. 

> > ...
> > Still fast enough ... but I was wondering why the planner would not use the new index and instead fall back on the
"report_drugs_drug_idx"single column index. 
>
> I'm actually a bit suspicious of those numbers; those are different
> drug id's than those from the first explain too.

You are completely right, this was the wrong plan ... the correct plan is:

 Sort  (cost=31757.71..31765.48 rows=3107 width=76) (actual time=722.348..722.461 rows=4076 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 495kB
   ->  Nested Loop  (cost=26041.17..31577.48 rows=3107 width=76) (actual time=139.407..721.090 rows=4076 loops=1)
         ->  Merge Join  (cost=26040.61..27143.53 rows=3107 width=64) (actual time=139.396..170.446 rows=4076 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=16332.08..16671.61 rows=135810 width=35) (actual time=108.866..119.143 rows=135370
loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 13992kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs d  (cost=0.44..4753.44 rows=135810
width=35)(actual time=0.038..53.758 rows=135532 loops=1) 
                           Index Cond: (drug = ANY ('{9557,17848,17880,18223}'::integer[]))
               ->  Sort  (cost=9708.53..9904.94 rows=78565 width=29) (actual time=30.517..34.876 rows=77163 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6702kB
                     ->  Index Only Scan using report_adverses_adverse_rid_severity_recovery_idx on report_adverses a
(cost=0.56..3320.57rows=78565 width=29) (actual time=1.005..16.135 rows=76972 loops=1) 
                           Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back
pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports r  (cost=0.56..1.42 rows=1 width=20)
(actualtime=0.134..0.135 rows=1 loops=4076) 
               Index Cond: (id = d.rid)
               Heap Fetches: 0
 Planning time: 29.415 ms
 Execution time: 723.545 ms

And this is now indeed much closer to the ~660 ms from before, it doesn't make much of a difference after all.

Cheers,
Tim

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Slow join over three tables