Обсуждение: seq scan over 3.3 million rows instead of single key index access

От:
"Andrus"
Дата:

There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
Instead of using single key index, 8.1.4 scans over whole rid table.
Sometimes idtelluued can contain more than single row so replacing join with
equality is not possible.

How to fix ?

Andrus.

CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
 INSERT INTO idtellUued VALUES(1249228);
explain analyze  select 1
   from dok JOIN rid USING(dokumnr)
 JOIN idtellUued USING(dokumnr)

"Hash Join  (cost=7483.22..222259.77 rows=5706 width=0) (actual
time=14905.981..27065.903 rows=8 loops=1)"
"  Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"  ->  Seq Scan on rid  (cost=0.00..198240.33 rows=3295833 width=4) (actual
time=0.036..15021.641 rows=3280576 loops=1)"
"  ->  Hash  (cost=7477.87..7477.87 rows=2140 width=8) (actual
time=0.114..0.114 rows=1 loops=1)"
"        ->  Nested Loop  (cost=0.00..7477.87 rows=2140 width=8) (actual
time=0.076..0.099 rows=1 loops=1)"
"              ->  Seq Scan on idtelluued  (cost=0.00..31.40 rows=2140
width=4) (actual time=0.006..0.011 rows=1 loops=1)"
"              ->  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..3.47
rows=1 width=4) (actual time=0.051..0.058 rows=1 loops=1)"
"                    Index Cond: (dok.dokumnr = "outer".dokumnr)"
"Total runtime: 27066.080 ms"


От:
Gregory Stark
Дата:

"Andrus" <> writes:

> There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
> Instead of using single key index, 8.1.4 scans over whole rid table.
> Sometimes idtelluued can contain more than single row so replacing join with
> equality is not possible.
>
> How to fix ?

Firstly the current 8.1 release is 8.1.15. Any of the bugs fixed in those 11
releases might be related to this.

Secondly:

> CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
> INSERT INTO idtellUued VALUES(1249228);
> explain analyze  select 1
>   from dok JOIN rid USING(dokumnr)
> JOIN idtellUued USING(dokumnr)
>
> "              ->  Seq Scan on idtelluued  (cost=0.00..31.40 rows=2140 width=4)
> (actual time=0.006..0.011 rows=1 loops=1)"

The planner thinks there are 2,140 rows in that temporary table so I don't
believe this is from the example posted. I would suggest running ANALYZE
idtellUued at some point before the problematic query.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

От:
Tom Lane
Дата:

Gregory Stark <> writes:
> "Andrus" <> writes:
>> There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
>> Instead of using single key index, 8.1.4 scans over whole rid table.
>> Sometimes idtelluued can contain more than single row so replacing join with
>> equality is not possible.
>>
>> How to fix ?

> Firstly the current 8.1 release is 8.1.15. Any of the bugs fixed in those 11
> releases might be related to this.

If this can still be reproduced in 8.1.15 it would be worth looking into.
My first guess is that there are multiple relevant indexes on the big
table and the old bugs in choose_bitmap_and() are making it mess up.


> The planner thinks there are 2,140 rows in that temporary table so I don't
> believe this is from the example posted. I would suggest running ANALYZE
> idtellUued at some point before the problematic query.

No, that's a pretty likely default assumption for a never-vacuumed,
never-analyzed table.   Your advice is correct though.

            regards, tom lane

От:
"Andrus"
Дата:

Gregory,

>  I would suggest running ANALYZE
> idtellUued at some point before the problematic query.

Thank you.
After adding analyze all is OK.
Is analyze command required in  8.3 also ?
Or is it better better to specify some hint at create temp table time since
I know the number of rows before running query ?

Andrus.

set search_path to firma2,public;
 CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
 INSERT INTO idtellUued VALUES(1249228);
analyze idtelluued;
 explain analyze  select 1
   from dok JOIN rid USING(dokumnr)
 JOIN idtellUued USING(dokumnr)

"Nested Loop  (cost=0.00..275.18 rows=3 width=0) (actual time=87.266..87.388
rows=8 loops=1)"
"  ->  Nested Loop  (cost=0.00..6.95 rows=1 width=8) (actual
time=36.613..36.636 rows=1 loops=1)"
"        ->  Seq Scan on idtelluued  (cost=0.00..1.01 rows=1 width=4)
(actual time=0.009..0.015 rows=1 loops=1)"
"        ->  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..5.93
rows=1 width=4) (actual time=36.585..36.590 rows=1 loops=1)"
"              Index Cond: (dok.dokumnr = "outer".dokumnr)"
"  ->  Index Scan using rid_dokumnr_idx on rid  (cost=0.00..267.23 rows=80
width=4) (actual time=50.635..50.672 rows=8 loops=1)"
"        Index Cond: ("outer".dokumnr = rid.dokumnr)"
"Total runtime: 87.586 ms"


От:
Andreas Kretschmer
Дата:

Andrus <> schrieb:

> There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
> Instead of using single key index, 8.1.4 scans over whole rid table.
> Sometimes idtelluued can contain more than single row so replacing join
> with equality is not possible.
>
> How to fix ?
>
> Andrus.
>
> CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
> INSERT INTO idtellUued VALUES(1249228);
> explain analyze  select 1
>   from dok JOIN rid USING(dokumnr)
> JOIN idtellUued USING(dokumnr)
>
> "Hash Join  (cost=7483.22..222259.77 rows=5706 width=0) (actual
> time=14905.981..27065.903 rows=8 loops=1)"
> "  Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
> "  ->  Seq Scan on rid  (cost=0.00..198240.33 rows=3295833 width=4)
> (actual time=0.036..15021.641 rows=3280576 loops=1)"

How many rows contains rid? The estimation are okay, rows=3295833 and
actual rows=3280576 are nearly identical. An index-scan makes only sense
if rid contains considerable more than 3000000 rows.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

От:
Andreas Kretschmer
Дата:

Andrus <> schrieb:

> There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
> Instead of using single key index, 8.1.4 scans over whole rid table.
> Sometimes idtelluued can contain more than single row so replacing join
> with equality is not possible.
>
> How to fix ?
>
> Andrus.
>
> CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
> INSERT INTO idtellUued VALUES(1249228);
> explain analyze  select 1
>   from dok JOIN rid USING(dokumnr)
> JOIN idtellUued USING(dokumnr)

Try to analyse the idtellUued-table after the insert. The planner has no
knowledge that this table contains only one or e few rows, the planner
assume 1000 (iirc) in this table.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

От:
"A. Kretschmer"
Дата:

am  Sun, dem 23.11.2008, um  6:20:08 +0200 mailte Andrus folgendes:
> Gregory,
>
> > I would suggest running ANALYZE
> >idtellUued at some point before the problematic query.
>
> Thank you.
> After adding analyze all is OK.
> Is analyze command required in  8.3 also ?

Yes.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

От:
"Andrus"
Дата:

> An index-scan makes only sense if rid contains considerable more than
> 3000000 rows.

I'm sorry, I meant using index to get the row.

Andrus.