Обсуждение: Text-any concatenation volatility acting as optimization barrier
Hi list, Andrew Dunstan reported an awkward-seeming case on IRC where shifting around a concatenation expression in a view made the planner choose a good or a bad execution plan. Simplified, it boils down to this: db=# create table foo(i int); db=# explain verbose select i from (select i, i::text || 'x' as asd from foo) as subq; Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4) Output: foo.i db=# explain verbose select i from (select i, i || 'x'::text as asd from foo) as subq; Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4) Output: subq.i -> Seq Scan on public.foo (cost=0.00..52.00rows=2400 width=4) Output: foo.i, ((foo.i)::text || 'x'::text) Case #1 uses the normal textcat(text, text) operator by automatically coercing 'x' as text. However, case #2 uses the anytextcat(anynonarray, text), which is marked as volatile thus acts as an optimization barrier. Later, the anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has no trace of what happened. Is this something we can, or want, to fix? One way would be doing preprocess_expression() before pull_up_subqueries() so function inlining happens earlier, but I can't imagine what unintended consequences that might have. Another option would be creating explicit immutable text || foo operators for common types, but that sounds pretty hacky. Regards, Marti
On 02/07/2012 03:18 PM, Marti Raudsepp wrote: > Hi list, > > Andrew Dunstan reported an awkward-seeming case on IRC where shifting > around a concatenation expression in a view made the planner choose a > good or a bad execution plan. > > Simplified, it boils down to this: > > db=# create table foo(i int); > db=# explain verbose select i from (select i, i::text || 'x' as asd > from foo) as subq; > Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4) > Output: foo.i > > db=# explain verbose select i from (select i, i || 'x'::text as asd > from foo) as subq; > Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4) > Output: subq.i > -> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4) > Output: foo.i, ((foo.i)::text || 'x'::text) > > Case #1 uses the normal textcat(text, text) operator by automatically > coercing 'x' as text. > However, case #2 uses the anytextcat(anynonarray, text), which is > marked as volatile thus acts as an optimization barrier. Later, the > anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has > no trace of what happened. > > Is this something we can, or want, to fix? > > One way would be doing preprocess_expression() before > pull_up_subqueries() so function inlining happens earlier, but I can't > imagine what unintended consequences that might have. > > Another option would be creating explicit immutable text || foo > operators for common types, but that sounds pretty hacky. > It gets worse if you replace the expression with a call to a (non-sql) function returning text, which was in fact the original use case. Then you're pretty much hosed. cheers andrew
On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan <andrew@dunslane.net> wrote: > It gets worse if you replace the expression with a call to a (non-sql) > function returning text, which was in fact the original use case. Then > you're pretty much hosed. Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE should solve it -- did you try that? Regards, Marti
On 02/07/2012 03:36 PM, Marti Raudsepp wrote: > On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan<andrew@dunslane.net> wrote: >> It gets worse if you replace the expression with a call to a (non-sql) >> function returning text, which was in fact the original use case. Then >> you're pretty much hosed. > Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE > should solve it -- did you try that? > Hmm, your're right. I could have sworn I tried that. That still leaves the oddity you reported, though. cheers andrew
Marti Raudsepp <marti@juffo.org> writes: > Case #1 uses the normal textcat(text, text) operator by automatically > coercing 'x' as text. > However, case #2 uses the anytextcat(anynonarray, text), which is > marked as volatile thus acts as an optimization barrier. Hmm ... since those operators were invented (in 8.3), we have adopted a policy that I/O functions are presumed to be no worse than stable: http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php ISTM that would justify relabeling anytextcat/textanycat as stable, which should fix this. > One way would be doing preprocess_expression() before > pull_up_subqueries() so function inlining happens earlier, but I can't > imagine what unintended consequences that might have. I'm pretty sure that would be a bad idea. regards, tom lane
On Wed, Feb 8, 2012 at 06:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marti Raudsepp <marti@juffo.org> writes: >> Case #1 uses the normal textcat(text, text) operator by automatically >> coercing 'x' as text. >> However, case #2 uses the anytextcat(anynonarray, text), which is >> marked as volatile thus acts as an optimization barrier. > > Hmm ... since those operators were invented (in 8.3), we have adopted a > policy that I/O functions are presumed to be no worse than stable: > http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php > ISTM that would justify relabeling anytextcat/textanycat as stable, > which should fix this. Yes, we should definitely take advantage of that. I scanned through all of pg_proc, there are 4 functions like this that can be changed: textanycat, anytextcat, quote_literal and quote_nullable. All of these have SQL wrappers to cast their argument to ::text. quote_literal | select pg_catalog.quote_literal($1::pg_catalog.text) quote_nullable | select pg_catalog.quote_nullable($1::pg_catalog.text) textanycat | select $1 || $2::pg_catalog.text anytextcat | select $1::pg_catalog.text || $2 Patch attached (in git am format). Passes all regression tests (except 'json' which fails on my machine even on git master). No documentation changes necessary AFAICT. Regards, Marti
Вложения
On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote: > Patch attached (in git am format). Passes all regression tests (except > 'json' which fails on my machine even on git master). Can you provide the diffs from the json test on your machine? I don't see any build-farm failures off-hand... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote: >> Patch attached (in git am format). Passes all regression tests (except >> 'json' which fails on my machine even on git master). > Can you provide the diffs from the json test on your machine? I don't > see any build-farm failures off-hand... I'm seeing diffs too after applying Marti's patch: instead of "z", "b", etc, the field labels in the json values look like "f1", "f2", etc in the output of queries such as SELECT row_to_json(q) FROM (SELECT $$a$$ || x AS b, y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), ROW(y.*,ARRAY[4,5,6])] ASz FROM generate_series(1,2) x, generate_series(4,5) y) q; I believe what is happening is that now that the planner can flatten the sub-select, the RowExprs are getting expanded differently, and that ties into the "when do we lose column names" business that Andrew has been on about. However, I was not seeing that before applying the patch, so maybe Marti has another issue too. I am going to go ahead and commit the patch with the altered json results, because IMO it is mere accident that these regression cases were coming out with "nice" field labels anyway. When and if Andrew gets the RowExpr cases fixed properly, that will show up as these cases going back to nicer-looking output. regards, tom lane
On Wed, Feb 8, 2012 at 18:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote: >>> Patch attached (in git am format). Passes all regression tests (except >>> 'json' which fails on my machine even on git master). > >> Can you provide the diffs from the json test on your machine? I don't >> see any build-farm failures off-hand... > > I'm seeing diffs too after applying Marti's patch: instead of "z", "b", > etc, the field labels in the json values look like "f1", "f2", etc in > the output of queries such as Sorry, my bad. I guess I got the two versions (patched and unpatched) mixed up. Indeed this difference disappears when I remove my patch. I'll have to be more careful when checking regression diffs in the future. :) Regards, Marti
On 02/08/2012 11:20 AM, Tom Lane wrote: > I am going to go ahead and commit the patch with the altered json > results, because IMO it is mere accident that these regression cases > were coming out with "nice" field labels anyway. When and if Andrew > gets the RowExpr cases fixed properly, that will show up as these > cases going back to nicer-looking output. I take it this is your way of making me hurry up with that work :-) cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 02/08/2012 11:20 AM, Tom Lane wrote: >> I am going to go ahead and commit the patch with the altered json >> results, because IMO it is mere accident that these regression cases >> were coming out with "nice" field labels anyway. When and if Andrew >> gets the RowExpr cases fixed properly, that will show up as these >> cases going back to nicer-looking output. > I take it this is your way of making me hurry up with that work :-) Well, right now the behavior is more consistent than it was before; we would surely have gotten questions about it as it was. Whether you find time to improve it or not isn't my concern at the moment. BTW, the order of the items in the json output doesn't match the column order of the sub-select, with or without the patch. This seems ... odd. Is it intentional? regards, tom lane