Обсуждение: Query in 9.0.2 not using index in 9.0.0 works fine

Поиск
Список
Период
Сортировка

Query in 9.0.2 not using index in 9.0.0 works fine

От
"Matthias Howell"
Дата:

I've just copied a database from one linux machine to another.

"Fast" machine is CentOS 5.5, running postgres 9.0.0 64 bit

 

"Slow" machine is Red Hat 5.5 running postgres 9.0.2 64 bit.

 

Here's the query:

explain analyze select sentenceid from sentences where sentenceid = any ( array(select sentenceid from sentences where docid = any(array[696374,696377])))

 

on the fast machine this is the explain:

"Bitmap Heap Scan on sentences  (cost=924.41..964.47 rows=10 width=8) (actual time=0.748..0.800 rows=41 loops=1)"

"  Recheck Cond: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=12.93..879.27 rows=220 width=8) (actual time=0.199..0.446 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..12.87 rows=220 width=0) (actual time=0.134..0.134 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"  ->  Bitmap Index Scan on pk_sentences  (cost=0.00..45.14 rows=10 width=0) (actual time=0.741..0.741 rows=41 loops=1)"

"        Index Cond: (sentenceid = ANY ($0))"

"Total runtime: 0.925 ms"

 

And on the slow machine:

"Seq Scan on sentences  (cost=10000000608.90..10000445893.60 rows=10 width=8) (actual time=2679.412..6372.393 rows=41 loops=1)"

"  Filter: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=10.73..608.90 rows=152 width=8) (actual time=0.044..0.076 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..10.69 rows=152 width=0) (actual time=0.037..0.037 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"Total runtime: 6372.468 ms"

 

The configurations were identical initially, I've changed those on the slow machine but to no avail.

 

there is an index on sentences on the docid in both systems.

 

I'm at quite a loss as to how/why this is occurring and what to do about it.

 

I tried disabling seqscan on the slow machine but that also made no difference.

 

Any help/ideas much appreciated.

 

Matthias

Re: Query in 9.0.2 not using index in 9.0.0 works fine

От
Samuel Gendler
Дата:


On Tue, Jul 5, 2011 at 1:50 PM, Matthias Howell <Matthias.Howell@voxco.com> wrote:

I've just copied a database from one linux machine to another.

"Fast" machine is CentOS 5.5, running postgres 9.0.0 64 bit

 

"Slow" machine is Red Hat 5.5 running postgres 9.0.2 64 bit.

 

Here's the query:

explain analyze select sentenceid from sentences where sentenceid = any ( array(select sentenceid from sentences where docid = any(array[696374,696377])))

 

on the fast machine this is the explain:

"Bitmap Heap Scan on sentences  (cost=924.41..964.47 rows=10 width=8) (actual time=0.748..0.800 rows=41 loops=1)"

"  Recheck Cond: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=12.93..879.27 rows=220 width=8) (actual time=0.199..0.446 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..12.87 rows=220 width=0) (actual time=0.134..0.134 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"  ->  Bitmap Index Scan on pk_sentences  (cost=0.00..45.14 rows=10 width=0) (actual time=0.741..0.741 rows=41 loops=1)"

"        Index Cond: (sentenceid = ANY ($0))"

"Total runtime: 0.925 ms"

 

And on the slow machine:

"Seq Scan on sentences  (cost=10000000608.90..10000445893.60 rows=10 width=8) (actual time=2679.412..6372.393 rows=41 loops=1)"

"  Filter: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=10.73..608.90 rows=152 width=8) (actual time=0.044..0.076 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..10.69 rows=152 width=0) (actual time=0.037..0.037 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"Total runtime: 6372.468 ms"

 

The configurations were identical initially, I've changed those on the slow machine but to no avail.

 

there is an index on sentences on the docid in both systems.

 

I'm at quite a loss as to how/why this is occurring and what to do about it.

 

I tried disabling seqscan on the slow machine but that also made no difference.

 

Any help/ideas much appreciated.


Have you done a vacuum analyze since loading the data on the slow db?  Are statistics settings the same between the two hosts?  It's interesting that one version coerces the docid values to bigint and the other coerces to integer, but that shouldn't impact the sentenceid comparison, which have to be a consistent type since it is comparing sentenceid to sentenceid. Any reason why this isn't collapsed down to 'select distinct sentenceid from sentences where docid = any(array[696374,696377])' - is there a benefit to the more complex structure?  For that matter, why not 'where docid in (696374,696377)'

I didn't see anything in the docs about distinct or any(array) that would indicate that that form should be preferred over IN ()

Re: Query in 9.0.2 not using index in 9.0.0 works fine

От
"Matthias Howell"
Дата:

 

 

From: Samuel Gendler [mailto:sgendler@ideasculptor.com]
Sent: Wednesday, July 06, 2011 3:43 AM
To: Matthias Howell
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query in 9.0.2 not using index in 9.0.0 works fine

 

 

On Tue, Jul 5, 2011 at 1:50 PM, Matthias Howell <Matthias.Howell@voxco.com> wrote:

I've just copied a database from one linux machine to another.

"Fast" machine is CentOS 5.5, running postgres 9.0.0 64 bit

 

"Slow" machine is Red Hat 5.5 running postgres 9.0.2 64 bit.

 

Here's the query:

explain analyze select sentenceid from sentences where sentenceid = any ( array(select sentenceid from sentences where docid = any(array[696374,696377])))

 

on the fast machine this is the explain:

"Bitmap Heap Scan on sentences  (cost=924.41..964.47 rows=10 width=8) (actual time=0.748..0.800 rows=41 loops=1)"

"  Recheck Cond: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=12.93..879.27 rows=220 width=8) (actual time=0.199..0.446 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..12.87 rows=220 width=0) (actual time=0.134..0.134 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::bigint[]))"

"  ->  Bitmap Index Scan on pk_sentences  (cost=0.00..45.14 rows=10 width=0) (actual time=0.741..0.741 rows=41 loops=1)"

"        Index Cond: (sentenceid = ANY ($0))"

"Total runtime: 0.925 ms"

 

And on the slow machine:

"Seq Scan on sentences  (cost=10000000608.90..10000445893.60 rows=10 width=8) (actual time=2679.412..6372.393 rows=41 loops=1)"

"  Filter: (sentenceid = ANY ($0))"

"  InitPlan 1 (returns $0)"

"    ->  Bitmap Heap Scan on sentences  (cost=10.73..608.90 rows=152 width=8) (actual time=0.044..0.076 rows=41 loops=1)"

"          Recheck Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"          ->  Bitmap Index Scan on sentdocs  (cost=0.00..10.69 rows=152 width=0) (actual time=0.037..0.037 rows=41 loops=1)"

"                Index Cond: (docid = ANY ('{696374,696377}'::integer[]))"

"Total runtime: 6372.468 ms"

 

The configurations were identical initially, I've changed those on the slow machine but to no avail.

 

there is an index on sentences on the docid in both systems.

 

I'm at quite a loss as to how/why this is occurring and what to do about it.

 

I tried disabling seqscan on the slow machine but that also made no difference.

 

Any help/ideas much appreciated.

 

Have you done a vacuum analyze since loading the data on the slow db?  Are statistics settings the same between the two hosts?  It's interesting that one version coerces the docid values to bigint and the other coerces to integer, but that shouldn't impact the sentenceid comparison, which have to be a consistent type since it is comparing sentenceid to sentenceid. Any reason why this isn't collapsed down to 'select distinct sentenceid from sentences where docid = any(array[696374,696377])' - is there a benefit to the more complex structure?  For that matter, why not 'where docid in (696374,696377)'

 

I didn't see anything in the docs about distinct or any(array) that would indicate that that form should be preferred over IN ()

 

______

 

I ran vacuum analyze, and I even dropped and recreated the index.

 

The differences between postgresql.conf are:

FAST:

shared_buffers = 2GB

 

SLOW:

shared_buffers = 4GB

 

The reason for using the any=array is that  the array of docids is passed in as a parameter.  This query is a subquery of a larger query.  I am trying to solve the problem for the smaller query.  The difference in the explain in the big query is essentially the seq scan on sentences.  This query is a sub query that is performed - with variations - 3 times in the larger query.  The the fast instance the larger query takes 950 milliseconds, on the slow instance, over 30 seconds.

 

However, in the end, it was user brain damage.

 

It does use the doc id index for the subquery, but for some reason, the primary key on sentences - the sentenceid - was not set.  So in fact, there is no index.

 

Machines vindicated once again.

Re: Query in 9.0.2 not using index in 9.0.0 works fine

От
Samuel Gendler
Дата:


On Wed, Jul 6, 2011 at 5:50 AM, Matthias Howell <Matthias.Howell@voxco.com> wrote:

 

However, in the end, it was user brain damage.

 

It does use the doc id index for the subquery, but for some reason, the primary key on sentences - the sentenceid - was not set.  So in fact, there is no index.

 

Machines vindicated once again.


For the record, if you follow the instructions for submitting slow query questions, we'd likely have spotted it very quickly if you hadn't spotted it yourself while doing the cut and paste.  The instructions ask for table definitions, so you'd likely have noticed the missing index when you copied those into your email.  The link (http://wiki.postgresql.org/wiki/SlowQueryQuestions ) is right there on the mailing list page at postgresql.org.  It's always a toss-up whether to attempt to answer a question like yours or just respond with a link to that page when the relevant info is missing ;-)

--sam