Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"
От | Heikki Linnakangas |
---|---|
Тема | Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION" |
Дата | |
Msg-id | 5717be79-8f7d-4039-815a-b6df22d3dc11@iki.fi обсуждение исходный текст |
Ответ на | Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION" (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"
|
Список | pgsql-hackers |
On 21/05/2024 05:58, David Rowley wrote: > Let this thread be for at least the coding portion of this or be my > thread for this patch for the v18 cycle if the RMT rules in favour of > keeping that code reverted for v17. > > I've attached 2 patches. > > 0001 is a simple revert of Tom's revert (7204f3591). > 0002 fixes the issue reported by Hubert. > > If anyone wants to have a look, I'd be grateful for that. Tom did > call for further review after this being the 4th issue reported for > 66c0185a3. My planner experience is a bit rusty, but I took a quick look. Looks generally OK to me. Some comments below: > + /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */ Duplicated word: "For for" > /* > * build_setop_child_paths > * Build paths for the set op child relation denoted by 'rel'. > * > * interesting_pathkeys: if not NIL, also include paths that suit these > * pathkeys, sorting any unsorted paths as required. > * *pNumGroups: if not NULL, we estimate the number of distinct groups > * in the result, and store it there The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent. I have a vague feeling that this comment deserves to be longer. The function does a lot of things. How is 'child_tlist' different from rel->reltarget for example? 'interesting_pathkeys' is modified by the call to add_setop_child_rel_equivalences(): it adds members to the EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is that obvious to someone who know more about the planner? > /* > * Create paths to suit final sort order required for setop_pathkeys. > * Here we'll sort the cheapest input path (if not sorted already) and > * incremental sort any paths which are partially sorted. > */ > is_sorted = pathkeys_count_contained_in(setop_pathkeys, > subpath->pathkeys, > &presorted_keys); > > if (!is_sorted) > { Maybe also mention that if it's already sorted, it's used as is. BTW, could the same machinery be used for INTERSECT as well? There was a brief mention of that in the original thread, but I didn't understand the details. Not for v17, but I'm curious. I was wondering if build_setop_child_paths() should be named build_union_child_paths(), since it's only used with UNIONs, but I guess it could be used as is for INTERSECT too. # Testing postgres=# begin; create table foo as select i from generate_series(1, 1000000) i; create index on foo (i); commit; BEGIN SELECT 1000000 CREATE INDEX COMMIT postgres=# set enable_seqscan=off; SET postgres=# explain (select 1 as i union select i from foo) order by i; QUERY PLAN ------------------------------------------------------------------------------------------------------ Unique (cost=144370.89..149370.89 rows=1000001 width=4) -> Sort (cost=144370.89..146870.89 rows=1000001 width=4) Sort Key: (1) -> Append (cost=0.00..31038.44 rows=1000001 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4) (6 rows) I'm disappointed it couldn't produce a MergeAppend plan. If you replace the "union" with "union all" you do get a MergeAppend. Some more cases where I hoped for a MergeAppend: postgres=# explain (select i, 'foo' from foo union select i, 'foo' from foo) order by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Unique (cost=380767.54..395767.54 rows=2000000 width=36) -> Sort (cost=380767.54..385767.54 rows=2000000 width=36) Sort Key: foo.i, ('foo'::text) -> Append (cost=0.42..62076.85 rows=2000000 width=36) -> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=36) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=36) (6 rows) postgres=# explain (select 'foo', i from foo union select 'bar', i from foo) order by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Unique (cost=380767.54..395767.54 rows=2000000 width=36) -> Sort (cost=380767.54..385767.54 rows=2000000 width=36) Sort Key: ('foo'::text), foo.i -> Append (cost=0.42..62076.85 rows=2000000 width=36) -> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=36) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=36) (6 rows) The following two queries are the same from the user's point of view, but one is written using WITH: postgres=# explain (select i from foo union (select 1::int order by 1) union select i from foo) order by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Unique (cost=326083.66..336083.67 rows=2000001 width=4) -> Sort (cost=326083.66..331083.67 rows=2000001 width=4) Sort Key: foo.i -> Append (cost=0.42..62076.87 rows=2000001 width=4) -> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=4) (7 rows) postgres=# explain with x (i) as (select 1::int order by 1) (select i from foo union select i from x union select i from foo) order by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------ Unique (cost=0.89..82926.54 rows=2000001 width=4) -> Merge Append (cost=0.89..77926.54 rows=2000001 width=4) Sort Key: foo.i -> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4) -> Sort (cost=0.02..0.03 rows=1 width=4) Sort Key: (1) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=4) (8 rows) I would've expected a MergeAppend in both cases. None of these test cases are broken as such, you just don't get the benefit of the optimization. I suspect they might all have the same root cause, as they all involve constants in the target list. I think that's a pretty common use case of UNION though. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Andrey M. Borodin"Дата:
Сообщение: Re: Injection points: preloading and runtime arguments