Обсуждение: PostgreSQL planner

Поиск
Список
Период
Сортировка

PostgreSQL planner

От
Misa Simic
Дата:
HI,

I have a wierd problem with PostgreSQL planner...

Problem showed up in Production on PG9.1 (Ubuntu)

But I have succeeded to get the same behavior on my PG 9.2 on Windows...

it is about 3 tables & onad one view - but view have volatile function:


CREATE TABLE t1
(
  calc_id serial NOT NULL,
  thing_id integer,
  CONSTRAINT t1_pk PRIMARY KEY (calc_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE t1
  OWNER TO postgres;

-- Index: t1_thing_id_idx

-- DROP INDEX t1_thing_id_idx;

CREATE INDEX t1_thing_id_idx
  ON t1
  USING btree
  (thing_id);


other columns from this real table are discarted - and not important, what is important is that in the moment I want to run the query... I know calc_id (pk of this table - but don't know thing_id)...

to simplify test I filled t1 with 100 rows with same values in calc_id and thing_id...

Second table are transactions about things:

CREATE TABLE t2
(
  trans_id serial NOT NULL,
  thing_id integer,
  no_index integer,
  CONSTRAINT t2_pk PRIMARY KEY (trans_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE t2
  OWNER TO postgres;

-- Index: t5_c2_idx

-- DROP INDEX t5_c2_idx;

CREATE INDEX t5_c2_idx
  ON t2
  USING btree
  (thing_id);


this table I have filled with 1m rows with rundom number in thing_id between 1 and 100

when we enter transaction about thing to t2, in some moment we could have additional info about the thing, in some moment not... so if we have additional info in the same time row is inserted in t2 and t3 with the same trans_id...

CREATE TABLE t3
(
  trans_id integer NOT NULL,
  c2_text text,
  CONSTRAINT t3_pk PRIMARY KEY (trans_id)
)
WITH (
  OIDS=FALSE
);

no additional indexes on t3...

now we have made a view:

CREATE OR REPLACE VIEW t2_left_t3_volatile AS 
 SELECT t2.trans_id, t2.thing_id, t2.no_index, t3.c2_text, random() AS random
   FROM t2
   LEFT JOIN t3 USING (trans_id);

And here we go:

we want see all transactions about the thing_id

EXPLAIN ANALYZE
SELECT * FROM t2_left_t3_volatile
WHERE thing_id = 20

everything is fine:

"Hash Left Join  (cost=452.46..13067.16 rows=12474 width=45) (actual time=6.537..62.633 rows=12038 loops=1)"
"  Hash Cond: (t2.trans_id = t3.trans_id)"
"  ->  Bitmap Heap Scan on t2  (cost=448.30..12985.03 rows=12474 width=12) (actual time=6.418..57.498 rows=12038 loops=1)"
"        Recheck Cond: (thing_id = 20)"
"        ->  Bitmap Index Scan on t5_c2_idx  (cost=0.00..445.18 rows=12474 width=0) (actual time=4.429..4.429 rows=12038 loops=1)"
"              Index Cond: (thing_id = 20)"
"  ->  Hash  (cost=2.96..2.96 rows=96 width=37) (actual time=0.086..0.086 rows=96 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 7kB"
"        ->  Seq Scan on t3  (cost=0.00..2.96 rows=96 width=37) (actual time=0.016..0.045 rows=96 loops=1)"
"Total runtime: 63.217 ms"

but problem is - we don't know the thing id - we know calc_id:

EXPLAIN ANALYZE
SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

and planner picks:

"Hash Join  (cost=6.42..48367.52 rows=12111 width=4) (actual time=0.261..471.042 rows=12038 loops=1)"
"  Hash Cond: (t2.thing_id = t1.thing_id)"
"  ->  Hash Left Join  (cost=4.16..31591.51 rows=1211101 width=45) (actual time=0.161..394.076 rows=1211101 loops=1)"
"        Hash Cond: (t2.trans_id = t3.trans_id)"
"        ->  Seq Scan on t2  (cost=0.00..24017.01 rows=1211101 width=12) (actual time=0.075..140.937 rows=1211101 loops=1)"
"        ->  Hash  (cost=2.96..2.96 rows=96 width=37) (actual time=0.069..0.069 rows=96 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 7kB"
"              ->  Seq Scan on t3  (cost=0.00..2.96 rows=96 width=37) (actual time=0.008..0.035 rows=96 loops=1)"
"  ->  Hash  (cost=2.25..2.25 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"        ->  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual time=0.017..0.030 rows=1 loops=1)"
"              Filter: (calc_id = 20)"
"              Rows Removed by Filter: 99"
"Total runtime: 471.505 ms"

Seq scan on all tables...

First thought was - maybe because of volatile function...

but on:
SELECT v.no_index FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

planner picks the same scenario... even function column is not in the query...

however, situation is fine, if we have a view without the volatile function:

CREATE OR REPLACE VIEW t2_left_t3 AS 
 SELECT t2.trans_id, t2.thing_id, t2.no_index, t3.c2_text
   FROM t2
   LEFT JOIN t3 USING (trans_id);

EXPLAIN ANALYZE
SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20


"Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual time=6.360..71.818 rows=12038 loops=1)"
"  ->  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual time=0.016..0.024 rows=1 loops=1)"
"        Filter: (calc_id = 20)"
"        Rows Removed by Filter: 99"
"  ->  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12) (actual time=6.330..69.063 rows=12038 loops=1)"
"        Recheck Cond: (thing_id = t1.thing_id)"
"        ->  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46 rows=12111 width=0) (actual time=4.372..4.372 rows=12038 loops=1)"
"              Index Cond: (thing_id = t1.thing_id)"
"Total runtime: 72.461 ms"

Any idea why planner picks bad plan if there is VOLATILE function?

there are no difference in result between:

SELECT v.no_index, random FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

And

SELECT v.no_index, random() FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

but huge difference in plan...

And logically there is no diff to (our solution)


EXPLAIN ANALYZE
SELECT * FROM t2_left_t3_volatile
WHERE thing_id = (SELECT thing_id FROM t1 WHERE calc_id = 20)


though real scenario is a lot more complex... i.e. t1 has start_date and end_date...
t3 has date colummn as well

so on simple question:

SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20 AND v.date BETWEEN t1.start_date AND t2.end_date

We would need to write 3 subqueries on the same table to dont use join...

but to dont use 3 times subquery... we use CTE

WITH calc AS
(
SELECT thing_id FROM t1 WHERE calc_id = 20
)
SELECT * FROM t2_left_t3_volatile v
WHERE v.thing_id=calc.thing_id AND v.date BETWEEN calc.start_date AND calc.end_date

And result is acceptable...

But solution is not good enough - it means, whenever we meet problem with perfomance (in production - unfortunatelly) - we will need to spend time to redefine - simple queries! :(

Now I am not sure - is this for perform or hackers list...

Any suggestion what we can do to improve things?

Or Any insights that things with planner inside Postgres will be improved in "reasonable time" - whatever it means :) :)

Thanks in advance,

Misa


Re: PostgreSQL planner

От
Robert Haas
Дата:
On Sat, Mar 23, 2013 at 8:12 PM, Misa Simic <misa.simic@gmail.com> wrote:
> but problem is - we don't know the thing id - we know calc_id:
>
> EXPLAIN ANALYZE
> SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you've got to scan all three tables.  The calc_id qual
can only be pushed down into the scan on t1, so you need the whole
t2/t3 join product.

> EXPLAIN ANALYZE
> SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you only need to scan 2 tables.  The join between t2
and t3 is eliminated by the join removal code in favor of scanning
only t2, as shown in the plan you included:

> "Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
> time=6.360..71.818 rows=12038 loops=1)"
> "  ->  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
> time=0.016..0.024 rows=1 loops=1)"
> "        Filter: (calc_id = 20)"
> "        Rows Removed by Filter: 99"
> "  ->  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12)
> (actual time=6.330..69.063 rows=12038 loops=1)"
> "        Recheck Cond: (thing_id = t1.thing_id)"
> "        ->  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46 rows=12111
> width=0) (actual time=4.372..4.372 rows=12038 loops=1)"
> "              Index Cond: (thing_id = t1.thing_id)"
> "Total runtime: 72.461 ms"

The difference is that this query has only one column in its target list, not *.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: PostgreSQL planner

От
Misa Simic
Дата:


On Friday, May 10, 2013, Robert Haas wrote:
On Sat, Mar 23, 2013 at 8:12 PM, Misa Simic <misa.simic@gmail.com> wrote:
> but problem is - we don't know the thing id - we know calc_id:
>
> EXPLAIN ANALYZE
> SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you've got to scan all three tables.  The calc_id qual
can only be pushed down into the scan on t1, so you need the whole
t2/t3 join product.

> EXPLAIN ANALYZE
> SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you only need to scan 2 tables.  The join between t2
and t3 is eliminated by the join removal code in favor of scanning
only t2, as shown in the plan you included:

> "Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
> time=6.360..71.818 rows=12038 loops=1)"
> "  ->  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
> time=0.016..0.024 rows=1 loops=1)"
> "        Filter: (calc_id = 20)"
> "        Rows Removed by Filter: 99"
> "  ->  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12)
> (actual time=6.330..69.063 rows=12038 loops=1)"
> "        Recheck Cond: (thing_id = t1.thing_id)"
> "        ->  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46 rows=12111
> width=0) (actual time=4.372..4.372 rows=12038 loops=1)"
> "              Index Cond: (thing_id = t1.thing_id)"
> "Total runtime: 72.461 ms"

The difference is that this query has only one column in its target list, not *.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Thanks Robert,

That is a bit "old" problem to us...

Solution for that kind of problems is: "rephrase" the question. So we have added one more layer to transform input query to "better" query for postgres - very wierd...

However, there are no differences... Planer use the same bad plan for:

 SELECT v.no_index FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

In that is one column as well...

We basicaly above query transform to:

SELECT v.no_index FROM t2_left_t3_volatile v 
WHERE v.thing_id = ( 
SELECT thing_id FROM t1 
WHERE calc_id = 20
)

What give us good result... Very wierd....

Thanks,
Misa