Обсуждение: performance on update table from a join

Поиск
Список
Период
Сортировка

performance on update table from a join

От
Jean-Luc Lachance
Дата:
I am updating a large (1M+) table from a join.  There is no question the
table has to be sequencially scanned. But, I am surprised that Postgres
would perform the join in order to do the update.

My take is that it should lookup in the "rs" table for a match and then
from this reduced set lookup the "routes" table.

Since it is performing an update, it is fair to assume that there will
be only one record from "routes" that will match the where clause. So,
why waste resources performing the join first?

Or maybe I am not reading the query plan correctly...

THX


nsd=# explain update c set newroute = r.route, route = r.routeno, street
= trim( lpad( r.seqno, 4, '0')), exchangeno = r.exchangeno
nsd-# from routes r, rs s where ( c.stname, c.municipality) =
(s.oldstname, s.oldmuni) and
nsd-# (s.exchangeno, s.stname, s.municipality) = ( r.exchangeno,
r.street, r.municipality) and
nsd-# r.fromno <= c.civic and c.civic <= r.tono and (r.parity =
c.civic%2 or r.parity = -1);
NOTICE:  QUERY PLAN:
Hash Join  (cost=109.44..118460.53 rows=1 width=857) ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604 width=519)
-> Hash  (cost=109.44..109.44 rows=1 width=338)       ->  Merge Join  (cost=0.00..109.44 rows=1 width=338)
-> Index Scan using routes_str_mun on routes r 
 
(cost=0.00..52.00 rows=1000 width=152)             ->  Index Scan using rs_stname on rs s  (cost=0.00..52.00
rows=1000 width=186)
EXPLAIN


Re: performance on update table from a join

От
Tom Lane
Дата:
Jean-Luc Lachance <jllachan@nsd.ca> writes:
> Hash Join  (cost=109.44..118460.53 rows=1 width=857)
>   ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604 width=519)
>   ->  Hash  (cost=109.44..109.44 rows=1 width=338)
>         ->  Merge Join  (cost=0.00..109.44 rows=1 width=338)
>               ->  Index Scan using routes_str_mun on routes r 
> (cost=0.00..52.00 rows=1000 width=152)
>               ->  Index Scan using rs_stname on rs s  (cost=0.00..52.00
> rows=1000 width=186)

Have you vacuum analyzed routes and rs?  I always disbelieve any plan
with 'rows=1000' in it, because I know that's the default assumption
when no stats are available...
        regards, tom lane


Re: performance on update table from a join

От
Jean-Luc Lachance
Дата:
Fair enough.

nsd=# explain update c set newroute = r.route, route = r.routeno, street
= trim( lpad( r.seqno, 4, '0')), exchangeno = r.exchangeno
nsd-# from routes r, rs s where ( c.stname, c.municipality) =
(s.oldstname, s.oldmuni) and
nsd-# (s.exchangeno, s.stname, s.municipality) = ( r.exchangeno,
r.street, r.municipality) and
nsd-# r.fromno <= c.civic and c.civic <= r.tono and (r.parity =
c.civic%2 or r.parity = -1);
NOTICE:  QUERY PLAN:

Hash Join  (cost=1943.04..136718.39 rows=100 width=660) ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604
width=519)->  Hash  (cost=1939.63..1939.63 rows=1365 width=141)       ->  Merge Join  (cost=0.00..1939.63 rows=1365
width=141)            ->  Index Scan using routes_str_mun on routes r 
 
(cost=0.00..1053.46 rows=13928 width=77)             ->  Index Scan using rs_stname on rs s  (cost=0.00..704.11
rows=10942 width=64)
EXPLAIN

I have also the following indecies that are ot being used:
create index routes_ex_str_mun on routes( exchangeno, street,
municipality);
create index rs_ostr_omun on rs( oldstname, oldmuni);




Tom Lane wrote:
> 
> Jean-Luc Lachance <jllachan@nsd.ca> writes:
> > Hash Join  (cost=109.44..118460.53 rows=1 width=857)
> >   ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604 width=519)
> >   ->  Hash  (cost=109.44..109.44 rows=1 width=338)
> >         ->  Merge Join  (cost=0.00..109.44 rows=1 width=338)
> >               ->  Index Scan using routes_str_mun on routes r  (cost=0.00..52.00 rows=1000 width=152)
> >               ->  Index Scan using rs_stname on rs s  (cost=0.00..52.00 rows=1000 width=186)
> 
> Have you vacuum analyzed routes and rs?  I always disbelieve any plan
> with 'rows=1000' in it, because I know that's the default assumption
> when no stats are available...
> 
>                         regards, tom lane


Re: performance on update table from a join

От
Tom Lane
Дата:
Jean-Luc Lachance <jllachan@nsd.ca> writes:
> Hash Join  (cost=1943.04..136718.39 rows=100 width=660)
>   ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604 width=519)
>   ->  Hash  (cost=1939.63..1939.63 rows=1365 width=141)
>         ->  Merge Join  (cost=0.00..1939.63 rows=1365 width=141)
>               ->  Index Scan using routes_str_mun on routes r 
> (cost=0.00..1053.46 rows=13928 width=77)
>               ->  Index Scan using rs_stname on rs s  (cost=0.00..704.11
> rows=10942 width=64)

Okay, those numbers look more believable.

Offhand this seems like a perfectly fine plan to me; computing the
r/s join once and forming it into an in-memory hashtable seems better
than probing the r and s tables again for each of 1M+ rows of c.
If the planner is way off about the size of that join (ie, it's not
1365 rows but many more) then maybe this isn't a good plan --- but you
haven't given us any info about that.

> I have also the following indecies that are ot being used:
> create index routes_ex_str_mun on routes( exchangeno, street,
> municipality);
> create index rs_ostr_omun on rs( oldstname, oldmuni);

That routes index could be used for the mergejoin if you had a
corresponding index on rs (ie, one on exchangeno,stname,municipality).
Not sure that this would make any significant improvement though.
Merging on street name is probably plenty good enough.

My thought is that the plan is fine, and if you are having a performance
problem with the update, it's more likely due to operations triggered by
the update rather than anything the planner can alter.  Do you have
rules, triggers, foreign keys on the c table, foreign keys pointing to
that table, a large number of indexes for that table?
        regards, tom lane


Re: performance on update table from a join

От
Jean-Luc Lachance
Дата:
Tom,

There is no index on c right now, and there aren't any tiggers, rules or
foreign index on any of the tables.

"rs" has 10941 rows ans "routes" has 13928.

JLL

Tom Lane wrote:
> 
> Jean-Luc Lachance <jllachan@nsd.ca> writes:
> > Hash Join  (cost=1943.04..136718.39 rows=100 width=660)
> >   ->  Seq Scan on c  (cost=0.00..112409.04 rows=1156604 width=519)
> >   ->  Hash  (cost=1939.63..1939.63 rows=1365 width=141)
> >         ->  Merge Join  (cost=0.00..1939.63 rows=1365 width=141)
> >               ->  Index Scan using routes_str_mun on routes r
> > (cost=0.00..1053.46 rows=13928 width=77)
> >               ->  Index Scan using rs_stname on rs s  (cost=0.00..704.11
> > rows=10942 width=64)
> 
> Okay, those numbers look more believable.
> 
> Offhand this seems like a perfectly fine plan to me; computing the
> r/s join once and forming it into an in-memory hashtable seems better
> than probing the r and s tables again for each of 1M+ rows of c.
> If the planner is way off about the size of that join (ie, it's not
> 1365 rows but many more) then maybe this isn't a good plan --- but you
> haven't given us any info about that.
> 
> > I have also the following indecies that are ot being used:
> > create index routes_ex_str_mun on routes( exchangeno, street,
> > municipality);
> > create index rs_ostr_omun on rs( oldstname, oldmuni);
> 
> That routes index could be used for the mergejoin if you had a
> corresponding index on rs (ie, one on exchangeno,stname,municipality).
> Not sure that this would make any significant improvement though.
> Merging on street name is probably plenty good enough.
> 
> My thought is that the plan is fine, and if you are having a performance
> problem with the update, it's more likely due to operations triggered by
> the update rather than anything the planner can alter.  Do you have
> rules, triggers, foreign keys on the c table, foreign keys pointing to
> that table, a large number of indexes for that table?
> 
>                         regards, tom lane