Re: Clarify the ordering guarantees in combining queries (or lack thereof)
| От | Tom Lane |
|---|---|
| Тема | Re: Clarify the ordering guarantees in combining queries (or lack thereof) |
| Дата | |
| Msg-id | 2742511.1657800108@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Re: Clarify the ordering guarantees in combining queries (or lack thereof) (Pantelis Theodosiou <ypercube@gmail.com>) |
| Ответы |
Re: Clarify the ordering guarantees in combining queries (or lack thereof)
|
| Список | pgsql-docs |
Pantelis Theodosiou <ypercube@gmail.com> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji@roji.org> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).
> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084
rows=853333loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299
rows=1280000loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms
It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.
>>> If you want ordered output use ORDER BY.
>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);
You do it like this:
=# explain analyze (select * from foo union all select * from foo) order by unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667
loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118
rows=853333loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual
time=0.073..118.566rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)
The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.
regards, tom lane
В списке pgsql-docs по дате отправления: