Обсуждение: union vs. sort
I'm surprise with query plan that PostgreSQL planner prepare forselects with ORDER BY if all data are from
sub-selectthat is alreadysorted. # explain select data from (select distinct data from
addr) as x order by x.data; ------------------------------------------------- Subquery
Scanx -> Unique -> Sort Sort Key: data -> Seq Scan
onaddr
This is right -- the main of query doesn't use "Sort" for ORDER BY,because subselect is sorted by "Unique".
And almost same query, but in the subselect is union: # explain select data from (select
datafrom addr union select data from addr2) as x
orderby x.data; ----------------------------------------- Sort Sort Key: data ->
SubqueryScan x -> Unique -> Sort Sort Key: data
-> Append -> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr -> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2
I think it's bad, because there is used extra sort for ORDER BY foralready by "Unique" sorted data.
If I add ORDER BY to subselect:
# explain select data from (select data from addr union
select data from addr2 order by data) as x order by x.data;
--------------------------------------------------- Sort Sort Key: data -> Subquery Scan x
-> Sort Sort Key: data -> Unique ->
Sort Sort Key: data -> Append
-> Subquery Scan "*SELECT* 1" -> Seq Scan on addr
-> Subquery Scan "*SELECT* 2" -> Seq Scan
onaddr2
I see two unnecessary sorts for unique and already sorted data.
The core of problem is probbaly UNION, because if I use simple query without subselect it still sort already sorderd
data:
# explain select data from addr union select data from addr2
order by data; ----------------------------------- Sort Sort Key: data -> Unique
-> Sort Sort Key: data -> Append ->
SubqueryScan "*SELECT* 1" -> Seq Scan on addr -> Subquery
Scan"*SELECT* 2" -> Seq Scan on addr2
Or order of data which returns "unique" is for UNION diffrent that datafrom DISTINCT? (see first example).
Karel
-- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/
Karel Zak <zakkr@zf.jcu.cz> writes:
> I'm surprise with query plan that PostgreSQL planner prepare for
> selects with ORDER BY if all data are from sub-select that is already
> sorted.
This isn't simply a matter of "omitting the sort". Even if the inputs
are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.
To do what you're thinking about, we'd have to build a variant
implementation of Unique that merges two presorted inputs --- and it
wouldn't work for more than two inputs (at least not without a lot of
pain ... Append is a messy special case in many ways, and we'd have to
duplicate most of that cruft to make an N-input version of Unique).
This is possible, without doubt, but I'm not excited about expending
that much time on it. You haven't shown any evidence that this would be
an important optimization in practice.
regards, tom lane
On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:
> Karel Zak <zakkr@zf.jcu.cz> writes:
> > I'm surprise with query plan that PostgreSQL planner prepare for
> > selects with ORDER BY if all data are from sub-select that is already
> > sorted.
>
> This isn't simply a matter of "omitting the sort". Even if the inputs
> are sorted, their concatenation (Append result) isn't sorted: "1 2 3 4"
> and "1 3 7 9" are sorted, but "1 2 3 4 1 3 7 9" isn't.
I didn't talk about "Append" result, but about "Unique" result. TheORDER BY in UNION query works with final
concanateddata -- that'sright. My question is why a result from this ORDER BY is again sorted:
# explain select data from (select data from addr union
select data from addr2 order by data) as x order by x.data;
-----------------------------------------------
(1) Sort Sort Key: data -> Subquery Scan x
(2) -> Sort Sort Key: data -> Unique
(3) -> Sort Sort Key: data
-> Append -> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr -> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2
I see three sorts with same data.
> To do what you're thinking about, we'd have to build a variant
> implementation of Unique that merges two presorted inputs --- and it
> wouldn't work for more than two inputs (at least not without a lot of
> pain ... Append is a messy special case in many ways, and we'd have to
> duplicate most of that cruft to make an N-input version of Unique).
I think it is not needful touch Append, but it should detect redundant sorts. Why "select data from (select data from
addrorder by data) as x order by x.data"
use only one sort?
> This is possible, without doubt, but I'm not excited about expending
> that much time on it. You haven't shown any evidence that this would be
> an important optimization in practice.
It's nothing important for me. It's from Czech databases mailing listwhere some PostgreSQL users was surprised with
EXPLAINresult of UNIONand speed of these queries.
Karel
-- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/
Karel Zak <zakkr@zf.jcu.cz> writes:
> On Tue, Apr 06, 2004 at 10:33:25AM -0400, Tom Lane wrote:
>> This isn't simply a matter of "omitting the sort".
> I didn't talk about "Append" result, but about "Unique" result. The
> ORDER BY in UNION query works with final concanated data -- that's
> right. My question is why a result from this ORDER BY is again sorted:
Oh, okay, that's just something that never got done, per this old
comment:
/* * We set current_pathkeys NIL indicating we do not know sort * order. This is correct when the
topset operation is UNION * ALL, since the appended-together results are unsorted even if * the subplans
weresorted. For other set operations we could be * smarter --- room for future improvement! */
I've committed changes to do the right thing in CVS tip.
regards, tom lane
On Wed, Apr 07, 2004 at 02:20:55PM -0400, Tom Lane wrote: > > I've committed changes to do the right thing in CVS tip. Thanks man! Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/