Planner issue on sorting joining of two tables with limit

Поиск
Список
Период
Сортировка
От Коротков Александр
Тема Planner issue on sorting joining of two tables with limit
Дата
Msg-id k2j2e2e4b141004251122u7adab639ve203afe7a486673f@mail.gmail.com
обсуждение исходный текст
Ответы Re: Planner issue on sorting joining of two tables with limit
Список pgsql-performance
Hello, everybody!

I'm using PostgreSQL 8.4.3, compiled by Visual C++ build 1400, 32-bit on Windows XP SP3.
I use following data model for issue reproducing.

CREATE TABLE test1
(
  id integer NOT NULL,
  "value" double precision,
  CONSTRAINT test1_pkey PRIMARY KEY (id)
);

CREATE INDEX test1_value_idx ON test1(value);

CREATE TABLE test2
(
  id integer NOT NULL,
  id1 integer NOT NULL REFERENCES test2 (id),
  "value" double precision,
  CONSTRAINT test2_pkey PRIMARY KEY (id)
);

CREATE INDEX test2_id1_value_idx ON test2(id1, value);

Following statements generate 200 rows of test data into test1 table and 1000000 rows of test data into test2 table.

INSERT INTO test1 (id, value) (SELECT x, random() from generate_series(1,200) x);

INSERT INTO test2 (id, id1, value) (SELECT x, (random()*200)::int + 1, random() from generate_series(1,1000000) x);

The following statements return top 10 rows from joining of two tables ordered by table1.value and table2.value. The

SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN test2 t2 ON t2.id1 = t1.id ORDER BY t1.value, t2.value
LIMIT 10

       value1        |        value2
---------------------+---------------------
 0.00562104489654303 |  0.00039379671216011
 0.00562104489654303 | 0.000658359378576279
 0.00562104489654303 | 0.000668979249894619
 0.00562104489654303 | 0.000768951140344143
 0.00562104489654303 |  0.00121330376714468
 0.00562104489654303 |  0.00122168939560652
 0.00562104489654303 |  0.00124016962945461
 0.00562104489654303 |  0.00134057039394975
 0.00562104489654303 |  0.00169069319963455
 0.00562104489654303 |  0.00171623658388853
(10 rows)

The statement plan doesn't use indexes. So the statement it is slow.

Limit  (cost=50614.88..50614.91 rows=10 width=16) (actual time=8388.331..8388.382 rows=10 loops=1)
  ->  Sort  (cost=50614.88..53102.45 rows=995025 width=16) (actual time=8388.324..8388.340 rows=10 loops=1)
        Sort Key: t1.value, t2.value
        Sort Method:  top-N heapsort  Memory: 17kB
        ->  Hash Join  (cost=6.50..29112.75 rows=995025 width=16) (actual time=0.982..6290.516 rows=997461 loops=1)
              Hash Cond: (t2.id1 = t1.id)
              ->  Seq Scan on test2 t2  (cost=0.00..15406.00 rows=1000000 width=12) (actualtime=0.088..2047.910 rows=1000000 loops=1)
              ->  Hash  (cost=4.00..4.00 rows=200 width=12) (actual time=0.870..0.870 rows=200 loops=1)
                    ->  Seq Scan on test1 t1  (cost=0.00..4.00 rows=200 width=12) (actual time=0.010..0.428 rows=200 loops=1)
Total runtime: 8388.473 ms

I can remove ordering by test2.value.

SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN test2 t2 ON t2.id1 = t1.id ORDER BY t1.value LIMIT 10

Then the result is the same.

       value1        |        value2
---------------------+---------------------
 0.00562104489654303 |  0.00039379671216011
 0.00562104489654303 | 0.000658359378576279
 0.00562104489654303 | 0.000668979249894619
 0.00562104489654303 | 0.000768951140344143
 0.00562104489654303 |  0.00121330376714468
 0.00562104489654303 |  0.00122168939560652
 0.00562104489654303 |  0.00124016962945461
 0.00562104489654303 |  0.00134057039394975
 0.00562104489654303 |  0.00169069319963455
 0.00562104489654303 |  0.00171623658388853
(10 rows)

The statement plan uses indexes and statement runs fast. This plan is exactly what I need.

Limit  (cost=0.00..0.62 rows=10 width=16) (actual time=0.049..0.148 rows=10 loops=1)
  ->  Nested Loop  (cost=0.00..62109.86 rows=995025 width=16) (actual time=0.044..0.107 rows=10 loops=1)
        ->  Index Scan using test1_value_idx on test1 t1  (cost=0.00..19.19 rows=200 width=12) (actual time=0.017..0.017 rows=1 loops=1)
        ->  Index Scan using test2_id1_value_idx on test2 t2  (cost=0.00..248.27 rows=4975 width=12) (actual time=0.013..0.042 rows=10 loops=1)
              Index Cond: (t2.id1 = t1.id)
Total runtime: 0.224 ms

So PostgreSQL planner can produce the plan I need but it doesn't produce this plan when I specify particular second ordering column. So is there any way to make planner produce desired plan when particular second ordering column is specified?

With best regards,
Korotkov Alexander.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Rick
Дата:
Сообщение: Re: autovacuum strategy / parameters
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Planner issue on sorting joining of two tables with limit