Обсуждение: Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

Поиск
Список
Период
Сортировка

Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

От
"ir. F.T.M. van Vugt bc."
Дата:
(Should probably be in [SQL] by now....)

I've changed my table declarations to agree on the datatypes and only one
simular problem with an update-query doesn't seem to be solved.

(see plan below)

* the concatenation in the lbar select can't be avoided, it's just the way the
data is => this does result in a resulting type 'text', AFAIK

* the aux_address.old_id is also of type 'text'


Still, the planner does a nested loop here against large costs... ;(


Any hints on this (last) one....?



TIA,





Frank.



trial=# explain update address set region_id = lbar.region_id from
    (select debtor_id || '-' || address_seqnr as id, region_id from
        list_base_regions) as lbar, aux_address aa
    where lbar.id = aa.old_id and address.id = aa.id;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.07..65.50 rows=3 width=253)
   Merge Cond: ("outer".id = "inner".id)
   ->  Nested Loop  (cost=0.00..643707.03 rows=3980 width=28)
         Join Filter: (((("inner".debtor_id)::text || '-'::text) ||
("inner".address_seqnr)::text) = "outer".old_id)
         ->  Index Scan using aux_address_idx2 on aux_address aa
(cost=0.00..81.88 rows=3989 width=16)
         ->  Seq Scan on list_base_regions  (cost=0.00..71.80 rows=3980
width=12)
   ->  Sort  (cost=1.07..1.08 rows=3 width=225)
         Sort Key: address.id
         ->  Seq Scan on address  (cost=0.00..1.05 rows=3 width=225)
               Filter: ((id = 1) IS NOT TRUE)
(10 rows)


Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

От
Tom Lane
Дата:
"ir. F.T.M. van Vugt bc." <ftm.van.vugt@foxi.nl> writes:
> Any hints on this (last) one....?

>    ->  Nested Loop  (cost=0.00..643707.03 rows=3980 width=28)
>          Join Filter: (((("inner".debtor_id)::text || '-'::text) ||
> ("inner".address_seqnr)::text) = "outer".old_id)

Looks to me like debtor_id and address_seqnr are not text type, but are
being compared to things that are text.  Hard to tell exactly what's
going on though --- I suppose this query is getting rewritten by a rule?

            regards, tom lane

Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

От
"Frank van Vugt"
Дата:
> > Any hints on this (last) one....?
> >    ->  Nested Loop  (cost=0.00..643707.03 rows=3980 width=28)
> >          Join Filter: (((("inner".debtor_id)::text || '-'::text) ||
> > ("inner".address_seqnr)::text) = "outer".old_id)
>
> Looks to me like debtor_id and address_seqnr are not text type, but are
> being compared to things that are text.

They were coerced, yes, but changing those original types helps only so much:

* lbar.debtor_id is of type text
* lbar.address_seqnr is of type text
* aa.old_id is of type text

trial=# explain update address set region_id = lbar.region_id from
    (select debtor_id || '-' || address_seqnr as f_id, region_id from
        list_base_regions) as lbar, aux_address aa
        where lbar.f_id = aa.old_id and address.id = aa.id;


Since the left side of the join clause is composed out of three concatenated
text-parts resulting in one single piece of type text, I'd expect the planner
to avoid the nested loop. Still:

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.07..16.07 rows=1 width=309)
   Merge Cond: ("outer".id = "inner".id)
   ->  Nested Loop  (cost=0.00..149669.38 rows=1000 width=84)
         Join Filter: ((("inner".debitor_id || '-'::text) ||
"inner".address_seqnr) = "outer".old_id)
         ->  Index Scan using aux_address_idx2 on aux_address aa
(cost=0.00..81.88 rows=3989 width=16)
         ->  Seq Scan on list_base_regions  (cost=0.00..20.00 rows=1000
width=68)
   ->  Sort  (cost=1.07..1.08 rows=3 width=225)
         Sort Key: address.id
         ->  Seq Scan on address  (cost=0.00..1.05 rows=3 width=225)
               Filter: ((id = 1) IS NOT TRUE)
(10 rows)



> Hard to tell exactly what's going on though

Does this help?




NB: it seems the data types part of the manual doesn't enlighten me on this
subject, any suggestions where to find more input?





Regards,




Frank.

Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

От
Tom Lane
Дата:
"Frank van Vugt" <ftm.van.vugt@foxi.nl> writes:
> Since the left side of the join clause is composed out of three concatenated
> text-parts resulting in one single piece of type text, I'd expect the
> planner to avoid the nested loop.

Probably not, since the first thing it does is to flatten the
sub-select, leaving it with a concatenation expression in the
WHERE-clause.  (I was too sleepy last night to realize that you
were comparing a concatenation to old_id, rather than making two
separate comparisons :-()

We really need to fix the planner to be able to do merge/hash on
"arbitrary expression = arbitrary expression", not only "Var = Var".
IIRC, this is doable in principle, but there are a few routines that
would need to be improved.

            regards, tom lane

Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION

От
Frank van Vugt
Дата:
> Probably not, since the first thing it does is to flatten the
> sub-select, leaving it with a concatenation expression in the
> WHERE-clause.

Ah, I see.

So, I'll just split this thingy into two seperate queries, starting with
creating a temp table containing the straight subselect results.

> We really need to fix the planner to be able to do merge/hash on
> "arbitrary expression = arbitrary expression", not only "Var = Var".

I can get around it, so I'm not complaining ;-)


Tom, thanks a *lot* for the prompt responses !!



Best,





Frank.