Обсуждение: Slow query using the Cube contrib module.

От:
Liviu Mirea-Ghiban
Дата:

  Hello,

I have a simple table which has a cube column and a cube GiST index. The
column contains 3-dimensional points (not cubes or intervals). The
problem is that I'm getting very slow queries when I'm using the index.
The table has about 130,000 rows and is full-vacuumed after any
updates/inserts (it is updated only once every 24 hours).


Table definition:
CREATE TABLE picof.photo_colors
(
   photo_id integer NOT NULL,
   color_percent real NOT NULL,
   lab_color picof.cube NOT NULL
)
WITH (
   OIDS=FALSE
);

CREATE INDEX photo_colors_index
   ON picof.photo_colors
   USING gist
   (lab_color);


My query:
SELECT photo_id FROM photo_colors
WHERE lab_color <@ cube_enlarge('0, 0, 0', 10, 3)


Explain analyze:
"Bitmap Heap Scan on photo_colors  (cost=13.40..421.55 rows=135 width=4)
(actual time=7.958..15.493 rows=14313 loops=1)"
"  Recheck Cond: (lab_color <@ '(-10, -10, -10),(10, 10, 10)'::cube)"
"  ->  Bitmap Index Scan on photo_colors_index  (cost=0.00..13.36
rows=135 width=0) (actual time=7.556..7.556 rows=14313 loops=1)"
"        Index Cond: (lab_color <@ '(-10, -10, -10),(10, 10, 10)'::cube)"
"Total runtime: 16.849 ms"
(Executed in PostgreSQL 8.4.4 on Windows and CentOS - same query plan)


Now, while it might not seem much, this is part of a bigger query in
which several such subqueries are being joined. The cost really adds up.

My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond? I've
executed dozens of such queries and not once did the rechecking remove
any rows. Is there any way to disable it, or do you have any other
suggestions for optimizations (because I'm all out of ideas)?

Thank you in advance!

---
Liviu Mirea

От:
Yeb Havinga
Дата:

Liviu Mirea-Ghiban wrote:
>
> My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond?
> I've executed dozens of such queries and not once did the rechecking
> remove any rows. Is there any way to disable it, or do you have any
> other suggestions for optimizations (because I'm all out of ideas)?
It's probably because the index nodes store data values with a lossy
compression, which means that the index scan returns more rows than
wanted, and that in turn is filtered out by the rescanning. See the
comments for the 'RECHECK' parameter of CREATE OPERATOR CLASS
(http://www.postgresql.org/docs/8.4/static/sql-createopclass.html). Its
unwise to alter this behaviour without taking a look/modifying the
underlying index implementation. The gist index scann part could perhaps
be made a bit faster by using a smaller blocksize, but I'm not sure if
or how the recheck part can be improved. Maybe rewriting the top query
to not do bitmap heap scans in subqueries or inner loops?

regards,
Yeb Havinga

От:
Tom Lane
Дата:

Yeb Havinga <> writes:
> Liviu Mirea-Ghiban wrote:
>> My question is: Why is it doing a Bitmap Heap Scan / Recheck Cond?
>> I've executed dozens of such queries and not once did the rechecking
>> remove any rows. Is there any way to disable it, or do you have any
>> other suggestions for optimizations (because I'm all out of ideas)?

> It's probably because the index nodes store data values with a lossy
> compression, which means that the index scan returns more rows than
> wanted, and that in turn is filtered out by the rescanning.

The recheck expression is only executed if the index reports that it's
not executed the search exactly.  If you don't see any difference
between the indexscan and bitmapscan output counts, it's probably
because the index can do the case exactly, so the recheck expression
isn't really getting used.  The planner has to include the expression
in the plan anyway, because the decision about lossiness is not known
until runtime.  But it's not costing any runtime.

The OP is mistaken to think there's anything wrong with this plan choice
---- more than likely, it's the best available plan.  The reason there's
a significant gap between the indexscan runtime and the bitmapscan
runtime is that that's the cost of going and actually fetching all those
rows from the table.  The only way to fix that is to buy a faster disk
or get more RAM so that more of the table can be held in memory.

            regards, tom lane