Re: Planner issue on sorting joining of two tables with limit
От | Alexander Korotkov |
---|---|
Тема | Re: Planner issue on sorting joining of two tables with limit |
Дата | |
Msg-id | g2r2e2e4b141005030657vd6a0945fj9b9650bf515a189f@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Planner issue on sorting joining of two tables with limit (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Planner issue on sorting joining of two tables with
limit
(Alexander Korotkov <aekorotkov@gmail.com>)
|
Список | pgsql-performance |
Well, no, because that plan wouldn't produce the specified ordering;
or at least it would be a lucky coincidence if it did. It's only
sorting on t1.value.
I just don't find why it is coincidence. I think that such plan will always produce result ordered by two columns, because such nested index scan always produce this result.
Let's consider some simple example in order to illustrate how this plan works.
t1
id | value
---+------
1 | 0.1
2 | 0.3
3 | 0.2
t2
id | id1 | value
---+-----+------
1 | 2 | 0.2
2 | 1 | 0.9
3 | 2 | 0.6
4 | 1 | 0.7
5 | 1 | 0.4
6 | 3 | 0.2
1) The outer index scan will find the row of t1 with least value using test1_value_idx. It will be row (1, 0.1)
2) The inner index scan will find all the rows in t2 where id1 = 1 using test2_id1_value_idx. But index test2_id1_value_idx have second order by value column and the index scan result will be ordered by value. That's why inner index scan will find rows (5, 1, 0.4), (4, 1, 0.7) and (2, 1, 0.9). And following query output will be produced:
value1 | value2
--------+-------
0.1 | 0.4
0.1 | 0.7
0.1 | 0.9
3) The outer index scan will find the row of t1 with the second value using test1_value_idx. It will be row (3, 0.2)
4) The inner index scan will find all the rows in t2 where id1 = 3 using test2_id1_value_idx. This row is (6, 3, 0.2). The following query output will be produced:
value1 | value2
--------+-------
0.2 | 0.2
5) The outer index scan will find the row of t1 with the third value using test1_value_idx. It will be row (2, 0.3)
6) The inner index scan will find all the rows in t2 where id1 = 2 using test2_id1_value_idx. These rows are (1, 2, 0.2) and (3, 2, 0.6). The following query output will be produced:
value1 | value2
--------+-------
0.3 | 0.2
0.3 | 0.6
And the whole query result is:
value1 | value2
--------+-------
0.1 | 0.4
0.1 | 0.7
0.1 | 0.9
0.2 | 0.2
0.3 | 0.2
0.3 | 0.6
And this result is really ordered by t1.value, t2.value.
I can't find error in my reasoning :)
The query without limit produce similar plan.
Let's consider some simple example in order to illustrate how this plan works.
t1
id | value
---+------
1 | 0.1
2 | 0.3
3 | 0.2
t2
id | id1 | value
---+-----+------
1 | 2 | 0.2
2 | 1 | 0.9
3 | 2 | 0.6
4 | 1 | 0.7
5 | 1 | 0.4
6 | 3 | 0.2
1) The outer index scan will find the row of t1 with least value using test1_value_idx. It will be row (1, 0.1)
2) The inner index scan will find all the rows in t2 where id1 = 1 using test2_id1_value_idx. But index test2_id1_value_idx have second order by value column and the index scan result will be ordered by value. That's why inner index scan will find rows (5, 1, 0.4), (4, 1, 0.7) and (2, 1, 0.9). And following query output will be produced:
value1 | value2
--------+-------
0.1 | 0.4
0.1 | 0.7
0.1 | 0.9
3) The outer index scan will find the row of t1 with the second value using test1_value_idx. It will be row (3, 0.2)
4) The inner index scan will find all the rows in t2 where id1 = 3 using test2_id1_value_idx. This row is (6, 3, 0.2). The following query output will be produced:
value1 | value2
--------+-------
0.2 | 0.2
5) The outer index scan will find the row of t1 with the third value using test1_value_idx. It will be row (2, 0.3)
6) The inner index scan will find all the rows in t2 where id1 = 2 using test2_id1_value_idx. These rows are (1, 2, 0.2) and (3, 2, 0.6). The following query output will be produced:
value1 | value2
--------+-------
0.3 | 0.2
0.3 | 0.6
And the whole query result is:
value1 | value2
--------+-------
0.1 | 0.4
0.1 | 0.7
0.1 | 0.9
0.2 | 0.2
0.3 | 0.2
0.3 | 0.6
And this result is really ordered by t1.value, t2.value.
I can't find error in my reasoning :)
The query without limit produce similar plan.
Nested Loop (cost=0.00..62109.86 rows=995025 width=16)
-> Index Scan using test1_value_idx on test1 t1 (cost=0.00..19.19 rows=200 width=12)
-> Index Scan using test2_id1_value_idx on test2 t2 (cost=0.00..248.27 rows=4975 width=12)
And I checked that the result is ordered by t1.value and t2.value.
2010/4/26 Tom Lane <tgl@sss.pgh.pa.us>
Коротков Александр <aekorotkov@gmail.com> writes:Well, no, because that plan wouldn't produce the specified ordering;
> So PostgreSQL planner can produce the plan I need but it doesn't produce
> this plan when I specify particular second ordering column.
or at least it would be a lucky coincidence if it did. It's only
sorting on t1.value.Not when the ordering columns come from two different tables. (If they
> So is there any
> way to make planner produce desired plan when particular second ordering
> column is specified?
were in the same table then scanning a two-column index could produce
the demanded sort order.) I don't see any way to satisfy this query
without an explicit sort step, which means it has to look at the whole
join output.
If you're willing to make assumptions like "the required 10 rows will be
within the first 100 t1.value rows" then you could nest an ORDER BY
t1.value LIMIT 100 query inside something that did an ORDER BY with both
columns. But this fails if you have too many duplicate t1.value values,
and your test case suggests that you might have a lot of them. In any
case it would stop being fast if you make the inner LIMIT very large.
regards, tom lane
В списке pgsql-performance по дате отправления:
Предыдущее
От: Mark StosbergДата:
Сообщение: Re: debugging handle exhaustion and 15 min/ 5mil row delete
Следующее
От: Alexander KorotkovДата:
Сообщение: Re: Planner issue on sorting joining of two tables with limit