Обсуждение: Slow query + why bitmap index scan??

От:
Laszlo Nagy
Дата:

This will be simple question to answer. :-) There is a single table:

select count(*) from product_price_history  -- 12982555 rows

This  table has exactly one index and on primary key constraint:

CREATE INDEX idx_product_price_history_id_hdate
   ON product_price_history
   USING btree
   (id, hdate);

ALTER TABLE product_price_history
   ADD CONSTRAINT pk_product_price_history PRIMARY KEY(hid);

No more constraints or indexes defined on this table. Rows are never
updated or deleted in this table, they are only inserted. It was
vacuum-ed and reindex-ed today.

Stats on the table:

seq scans=13, index scans=108, table size=3770MB, toast table size=8192
bytes, indexes size=666MB

This query:

select hid from product_price_history where id=35547581

Returns 759 rows in 8837 msec! How can this be that slow???

The query plan is:

"Bitmap Heap Scan on product_price_history  (cost=13.90..1863.51
rows=472 width=8)"
"  Recheck Cond: (id = 35547581)"
"  ->  Bitmap Index Scan on idx_product_price_history_id_hdate
(cost=0.00..13.78 rows=472 width=0)"
"        Index Cond: (id = 35547581)"

I don't understand why PostgreSQL uses bitmap heap scan + bitmap index
scan? Why not just use an regular index scan? Data in a btree index is
already sorted. A normal index scan should take no more than a few page
reads. This sould never take 8 seconds.

Thanks,

    Laszlo


От:
Florian Weimer
Дата:

* Laszlo Nagy:

> This query:
>
> select hid from product_price_history where id=35547581
>
> Returns 759 rows in 8837 msec! How can this be that slow???

If most records are on different heap pages, processing this query
requires many seeks.  11ms per seek is not too bad if most of them are
cache misses.

--
Florian Weimer                <>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

От:
Laszlo Nagy
Дата:

On 2011-01-12 14:42, Florian Weimer wrote:
> * Laszlo Nagy:
>
>> This query:
>>
>> select hid from product_price_history where id=35547581
>>
>> Returns 759 rows in 8837 msec! How can this be that slow???
> If most records are on different heap pages, processing this query
> requires many seeks.  11ms per seek is not too bad if most of them are
> cache misses.
How about this:

select id,hdate from product_price_history where id=35547581 -- 759
rows, 8837 ms
Query time average: 3 sec.
Query plan:

"Bitmap Heap Scan on product_price_history  (cost=13.91..1871.34
rows=474 width=16)"
"  Recheck Cond: (id = 35547582)"
"  ->  Bitmap Index Scan on idx_product_price_history_id_hdate
(cost=0.00..13.79 rows=474 width=0)"
"        Index Cond: (id = 35547582)"

Why still the heap scan here? All fields in the query are in the
index... Wouldn't a simple index scan be faster? (This is only a
theoretical question, just I'm curious.)

My first idea to speed things up is to cluster this table regularly.
That would convert (most of the) rows into a few pages. Few page reads
-> faster query. Is it a good idea?

Another question. Do you think that increasing shared_mem would make it
faster?

Currently we have:

shared_mem = 6GB
work_mem = 512MB
total system memory=24GB

Total database size about 30GB, but there are other programs running on
the system, and many other tables.

Thanks,

    Laszlo


От:
Kenneth Marshall
Дата:

On Wed, Jan 12, 2011 at 03:21:45PM +0100, Laszlo Nagy wrote:
> On 2011-01-12 14:42, Florian Weimer wrote:
>> * Laszlo Nagy:
>>
>>> This query:
>>>
>>> select hid from product_price_history where id=35547581
>>>
>>> Returns 759 rows in 8837 msec! How can this be that slow???
>> If most records are on different heap pages, processing this query
>> requires many seeks.  11ms per seek is not too bad if most of them are
>> cache misses.
> How about this:
>
> select id,hdate from product_price_history where id=35547581 -- 759 rows,
> 8837 ms
> Query time average: 3 sec.
> Query plan:
>
> "Bitmap Heap Scan on product_price_history  (cost=13.91..1871.34 rows=474
> width=16)"
> "  Recheck Cond: (id = 35547582)"
> "  ->  Bitmap Index Scan on idx_product_price_history_id_hdate
> (cost=0.00..13.79 rows=474 width=0)"
> "        Index Cond: (id = 35547582)"
>
> Why still the heap scan here? All fields in the query are in the index...
> Wouldn't a simple index scan be faster? (This is only a theoretical
> question, just I'm curious.)
>

Because of PostgreSQL's MVCC design, it must visit each heap tuple
to check its visibility as well as look it up in the index.

> My first idea to speed things up is to cluster this table regularly. That
> would convert (most of the) rows into a few pages. Few page reads -> faster
> query. Is it a good idea?
>

Yes, clustering this table would greatly speed up this type of query.

> Another question. Do you think that increasing shared_mem would make it
> faster?

I doubt it.

>
> Currently we have:
>
> shared_mem = 6GB
> work_mem = 512MB
> total system memory=24GB
>
> Total database size about 30GB, but there are other programs running on the
> system, and many other tables.
>
> Thanks,
>
>    Laszlo
>

Clustering is your best option until we get indexes with visibility
information.

Cheers,
Ken

От:
"Kevin Grittner"
Дата:

Laszlo Nagy <> wrote:

> shared_mem = 6GB
> work_mem = 512MB
> total system memory=24GB

In addition to the good advice from Ken, I suggest that you set
effective_cache_size (if you haven't already).  Add whatever the OS
shows as RAM used for cache to the shared_mem setting.

But yeah, for your immediate problem, if you can cluster the table
on the index involved, it will be much faster.  Of course, if the
table is already in a useful order for some other query, that might
get slower, and unlike some other products, CLUSTER in PostgreSQL
doesn't *maintain* that order for the data as new rows are added --
so this should probably become a weekly (or monthly or some such)
maintenance operation.

-Kevin

От:
Laszlo Nagy
Дата:

On 2011-01-12 15:36, Kevin Grittner wrote:
> Laszlo Nagy<>  wrote:
>
>> shared_mem = 6GB
>> work_mem = 512MB
>> total system memory=24GB
>
> In addition to the good advice from Ken, I suggest that you set
> effective_cache_size (if you haven't already).  Add whatever the OS
> shows as RAM used for cache to the shared_mem setting.
It was 1GB. Now I changed to 2GB. Although the OS shows 9GB inactive
memory, we have many concurrent connections to the database server. I
hope it is okay to use 2GB.
>
> But yeah, for your immediate problem, if you can cluster the table
> on the index involved, it will be much faster.  Of course, if the
> table is already in a useful order for some other query, that might
> get slower, and unlike some other products, CLUSTER in PostgreSQL
> doesn't *maintain* that order for the data as new rows are added --
> so this should probably become a weekly (or monthly or some such)
> maintenance operation.
Thank you! After clustering, queries are really fast. I don't worry
about other queries. This is the only way we use this table - get
details for a given id value. I put the CLUSTER command into a cron
script that runs daily. For the second time, it took 2 minutes to run so
I guess it will be fine.

Thank you for your help.

    Laszlo


От:
"Kevin Grittner"
Дата:

Laszlo Nagy <> wrote:

>>  In addition to the good advice from Ken, I suggest that you set
>> effective_cache_size (if you haven't already).  Add whatever the
>> OS shows as RAM used for cache to the shared_mem setting.
> It was 1GB. Now I changed to 2GB. Although the OS shows 9GB
> inactive memory, we have many concurrent connections to the
> database server. I hope it is okay to use 2GB.

effective_cache_size doesn't cause any RAM to be allocated, it's
just a hint to the costing routines.  Higher values tend to favor
index use, while lower values tend to favor sequential scans.  I
suppose that if you regularly have many large queries running at the
same moment you might not want to set it to the full amount of cache
space available, but I've usually had good luck setting to the sum
of shared_buffers space and OS cache.

Since it only affects plan choice, not memory allocations, changing
it won't help if good plans are already being chosen.

-Kevin

От:
Cédric Villemain
Дата:

2011/1/12 Kevin Grittner <>:
> Laszlo Nagy <> wrote:
>
>>>  In addition to the good advice from Ken, I suggest that you set
>>> effective_cache_size (if you haven't already).  Add whatever the
>>> OS shows as RAM used for cache to the shared_mem setting.
>> It was 1GB. Now I changed to 2GB. Although the OS shows 9GB
>> inactive memory, we have many concurrent connections to the
>> database server. I hope it is okay to use 2GB.
>
> effective_cache_size doesn't cause any RAM to be allocated, it's
> just a hint to the costing routines.  Higher values tend to favor
> index use, while lower values tend to favor sequential scans.  I
> suppose that if you regularly have many large queries running at the
> same moment you might not want to set it to the full amount of cache
> space available,
> but I've usually had good luck setting to the sum
> of shared_buffers space and OS cache.

What is the OS used ? Do you have  windows ? if yes  the current
parameters are not good, and linux should not have 9GB of 'inactive'
(?) memory.

>
> Since it only affects plan choice, not memory allocations, changing
> it won't help if good plans are already being chosen.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support