Re: unnesesary sorting after Merge Full Join
От | Decibel! |
---|---|
Тема | Re: unnesesary sorting after Merge Full Join |
Дата | |
Msg-id | D317B25B-7995-43A2-8863-55A8505F827A@decibel.org обсуждение исходный текст |
Ответ на | unnesesary sorting after Merge Full Join (Alexey Nalbat <nalbat@price.ru>) |
Ответы |
Re: unnesesary sorting after Merge Full Join
|
Список | pgsql-general |
On Feb 21, 2008, at 4:08 AM, Alexey Nalbat wrote: > I'd like to use ORDER BY in any specified order and LIMIT, OFFSET > for paging query results. > The query is FULL OUTER JOIN of two tables by field id. I think the > results of Merge Full Join > to be ordered by some "combined id". And there is no need in extra > Sort if I specify ORDER BY > that "combined id". But unfortunately it is not so. > > Here is a simple example: > -- BEGIN > create table t1 as select generate_series(1,1000000,2) as id; > create table t2 as select generate_series(1,1000000,3) as id; > > create index i1 on t1 ( id ); > create index i2 on t2 ( id ); > > analyze t1; > analyze t2; > > explain analyze > select id, t1.*, t2.* > from t1 natural full join t2 > order by 1 limit 10 offset 10; > > drop table t1; > drop table t2; > -- END > > Postgresql chooses such plan: > Limit (cost=44080.12..44080.15 rows=10 width=8) (actual > time=6724.850..6724.906 rows=10 loops=1) > -> Sort (cost=44080.10..45330.10 rows=500000 width=8) (actual > time=6724.806..6724.845 rows=20 loops=1) > Sort Key: (COALESCE(t1.id, t2.id)) > Sort Method: top-N heapsort Memory: 25kB > -> Merge Full Join (cost=0.00..30775.28 rows=500000 > width=8) (actual time=0.142..5237.289 rows=666667 loops=1) > Merge Cond: (t1.id = t2.id) > -> Index Scan using i1 on t1 (cost=0.00..15212.30 > rows=500000 width=4) (actual time=0.079..1188.601 rows=500000 loops=1) > -> Index Scan using i2 on t2 (cost=0.00..10146.30 > rows=333334 width=4) (actual time=0.051..793.635 rows=333334 loops=1) > > The desired plan is much faster: > Limit (cost=0.62..1.23 rows=10 width=8) (actual time=0.262..0.366 > rows=10 loops=1) > -> Merge Full Join (cost=0.00..30775.28 rows=500000 width=8) > (actual time=0.156..0.303 rows=20 loops=1) > Merge Cond: (t1.id = t2.id) > -> Index Scan using i1 on t1 (cost=0.00..15212.30 > rows=500000 width=4) (actual time=0.088..0.120 rows=15 loops=1) > -> Index Scan using i2 on t2 (cost=0.00..10146.30 > rows=333334 width=4) (actual time=0.056..0.078 rows=11 loops=1) > > I found comment in src/backend/optimizer/path/pathkeys.c: > * EXCEPTION: in a FULL or RIGHT join, we cannot treat the result as > * having the outer path's path keys, because null lefthand rows may be > * inserted at random points. It must be treated as unsorted. > > How can I get rid of this sorting? Or could this behavior of Merge > Full Join be improved? Theoretically, this can be improved, but I suspect it would be non- trivial. I suspect that the problem is the planner doesn't realize that the join key could never be null, which is often not the case. Consider this example: decibel=# create table t1 as select generate_series(1,1000000,2) as id1; SELECT decibel=# create table t2 as select generate_series(1,1000000,3) as id2; SELECT Create index, etc. explain analyze select id1, id2 from t1 full join t2 on (t1.id1=t2.id2) order by 1 limit 10 offset 10; Note that in this case you have to re-sort, because NULLs sort differently. As a workaround, I suggest creating a table that contains all the IDs from t1 and t2. You could maintain this table via a trigger if you wanted. You could then quickly determine the exact IDs you wanted, and then join against the two real tables. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Вложения
В списке pgsql-general по дате отправления:
Следующее
От: Decibel!Дата:
Сообщение: Re: Approaches for Lookup values (codes) in OLTP application