Обсуждение: pg_get_viewdefs() indentation considered harmful
We're finding it more and more common for people to define partitioned table views with hundreds or thousands of union branches. pg_get_viewdefs indents each branch of the union by 8 spaces more than the previous branch. The net effect is that the size of the output is O(n^2) bytes just for the indentation spaces. If your union branches are about 5 lines long then you hit MaxAlloc after about 5,000 branches. And incidentally there's no CHECK_FOR_INTERRUPTS in that loop. I would actually suggest pg_dump should be able to pass a flag to disable indentation but then I noticed that all the code is already there. It's just that every single variation of pg_get_viewdef sets the indentation flag explicitly. I wonder if this was the intended behaviour. Shouldn't pg_get_viewdef(view, false) set indentation off? -- greg
Greg Stark <stark@mit.edu> writes: > We're finding it more and more common for people to define partitioned > table views with hundreds or thousands of union branches. Really? Given how poorly the system performs with that many inheritance children, I've got a hard time believing either that this is common or that ruleutils is your worst problem with it. > pg_get_viewdefs indents each branch of the union by 8 spaces more than > the previous branch. I think that's because the unions are a nested binary tree so far as the parsetree representation goes. We could probably teach ruleutils to flatten the display in common cases, but it might be a bit tricky to be sure we don't create any lies about unusual nestings. regards, tom lane
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> pg_get_viewdefs indents each branch of the union by 8 spaces more than >> the previous branch. > > I think that's because the unions are a nested binary tree so far as the > parsetree representation goes. We could probably teach ruleutils to > flatten the display in common cases, but it might be a bit tricky to be > sure we don't create any lies about unusual nestings. That may be a worthwhile thing to do. But it strikes me that pg_dump, at least when not doing an SQL dump, really has no reason to ask for indentation at all. It's already asking for non-prettyprinted output, why not make non-prettyprinted also mean non-indented? Also, it also seems like a reasonable safeguard that if the underlying rule is larger than, say, 32kB or maybe 128kB you probably don't care about indentation. That would cover all the system views except a handful that seem to have peculiarly large pg_rewrite rules considering their SQL definition isn't especially large: daeqck898dvduj=> select ev_class::regclass, length(ev_action) rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len, length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from pg_rewrite order by 2 desc limit 20; ev_class | rewrite_len | prettyprint_len | non_prettyprint_len --------------------------------------------+-------------+-----------------+---------------------pg_seclabels | 138565 | 13528 | 13758information_schema.columns | 126787 | 6401 | 6642information_schema.usage_privileges | 93863 | 11653 | 11939information_schema.attributes | 83268 | 4264 | 4417information_schema.referential_constraints | 81762 | 2527 | 2653pg_statio_all_tables | 59617 | 1023 | 1061pg_stats | 59331 | 2803 | 2899information_schema.domains | 54611 | 3206 | 3320information_schema.element_types | 53049 | 5608 | 5762information_schema.routines | 52333 | 7852 | 8089information_schema.column_privileges | 49229 | 3883 | 3954pg_indexes | 46717 |458 | 486information_schema.check_constraints | 42132 | 1375 | 1443information_schema.tables | 37458 | 2122 | 2212pg_stat_all_indexes | 35426 |508 | 528pg_statio_all_indexes | 35412 |490 | 512information_schema.table_constraints | 31882 | 3098 | 3231information_schema.column_udt_usage | 31731 | 1034 | 1090information_schema.parameters | 30497 | 3640 | 3750pg_stat_all_tables | 27193 | 1367 | 1387 (20 rows) -- greg
Argh. Attached is a plain text file with that query data. I'll be switching back to Gnus any day now.
Вложения
Greg Stark <stark@mit.edu> writes: > But it strikes me that pg_dump, at least when not doing an SQL dump, > really has no reason to ask for indentation at all. It's already > asking for non-prettyprinted output, why not make non-prettyprinted > also mean non-indented? We do go to some effort to make pg_dump's output legible, so I'm not especially on board with this idea. regards, tom lane
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> We're finding it more and more common for people to define partitioned >> table views with hundreds or thousands of union branches. > > Really? Given how poorly the system performs with that many inheritance > children, I've got a hard time believing either that this is common or > that ruleutils is your worst problem with it. Doesn't make it a bad idea to fix it. You may need hundreds or thousands of union branches for this to totally break the world, but you only need about five for it to be annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Greg Stark <stark@mit.edu> writes: >>> We're finding it more and more common for people to define partitioned >>> table views with hundreds or thousands of union branches. >> >> Really? Given how poorly the system performs with that many inheritance >> children, I've got a hard time believing either that this is common or >> that ruleutils is your worst problem with it. > > Doesn't make it a bad idea to fix it. You may need hundreds or > thousands of union branches for this to totally break the world, but > you only need about five for it to be annoying. Indeed even aside from the performance questions, once you're indented 5-10 times the indention stops being useful at all. The query would probably be even more readable if we just made indentation modulo 40 so once you get too far indented it "wraps around" which is not unlike how humans actually indent things in this case. -- greg
On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote: > On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Greg Stark <stark@mit.edu> writes: >>>> We're finding it more and more common for people to define partitioned >>>> table views with hundreds or thousands of union branches. >>> >>> Really? Given how poorly the system performs with that many inheritance >>> children, I've got a hard time believing either that this is common or >>> that ruleutils is your worst problem with it. >> >> Doesn't make it a bad idea to fix it. You may need hundreds or >> thousands of union branches for this to totally break the world, but >> you only need about five for it to be annoying. > > Indeed even aside from the performance questions, once you're indented > 5-10 times the indention stops being useful at all. The query would > probably be even more readable if we just made indentation modulo 40 > so once you get too far indented it "wraps around" which is not unlike > how humans actually indent things in this case. Ha! That seems a little crazy, but *capping* the indentation at some reasonable value might not be dumb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote: >> Indeed even aside from the performance questions, once you're indented >> 5-10 times the indention stops being useful at all. The query would >> probably be even more readable if we just made indentation modulo 40 >> so once you get too far indented it "wraps around" which is not unlike >> how humans actually indent things in this case. > Ha! That seems a little crazy, but *capping* the indentation at some > reasonable value might not be dumb. I could go for either of those approaches, if applied uniformly, and actually Greg's suggestion sounds a bit better: it seems more likely to preserve some readability in deeply nested constructs. With either approach you need to ask where the limit value is going to come from. Is it OK to just hard-wire a magic number, or do we need to expose a knob somewhere? regards, tom lane
On 01/25/2014 11:06 AM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote: >>> Indeed even aside from the performance questions, once you're indented >>> 5-10 times the indention stops being useful at all. The query would >>> probably be even more readable if we just made indentation modulo 40 >>> so once you get too far indented it "wraps around" which is not unlike >>> how humans actually indent things in this case. >> Ha! That seems a little crazy, but *capping* the indentation at some >> reasonable value might not be dumb. > I could go for either of those approaches, if applied uniformly, and > actually Greg's suggestion sounds a bit better: it seems more likely > to preserve some readability in deeply nested constructs. > > With either approach you need to ask where the limit value is going > to come from. Is it OK to just hard-wire a magic number, or do we > need to expose a knob somewhere? > > Simply capping it is probably the best bang for the buck. I suspect most people would prefer to have "q1 union q2 union q3 union q4" with the subqueries all indented to the same level. But I understand the difficulties in doing so. A knob seems like overkill. I'd just hardwire some number, say three or four levels of indentation. cheers andrew
On Sat, Jan 25, 2014 at 01:02:36PM -0500, Andrew Dunstan wrote: > > On 01/25/2014 11:06 AM, Tom Lane wrote: > >Robert Haas <robertmhaas@gmail.com> writes: > >>On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote: > >>>Indeed even aside from the performance questions, once you're indented > >>>5-10 times the indention stops being useful at all. The query would > >>>probably be even more readable if we just made indentation modulo 40 > >>>so once you get too far indented it "wraps around" which is not unlike > >>>how humans actually indent things in this case. > >>Ha! That seems a little crazy, but *capping* the indentation at some > >>reasonable value might not be dumb. > >I could go for either of those approaches, if applied uniformly, and > >actually Greg's suggestion sounds a bit better: it seems more likely > >to preserve some readability in deeply nested constructs. > > > >With either approach you need to ask where the limit value is going > >to come from. Is it OK to just hard-wire a magic number, or do we > >need to expose a knob somewhere? > > > > > > > Simply capping it is probably the best bang for the buck. I suspect > most people would prefer to have "q1 union q2 union q3 union q4" > with the subqueries all indented to the same level. But I understand > the difficulties in doing so. > > A knob seems like overkill. I'd just hardwire some number, say three > or four levels of indentation. Did we address this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
I propose the attached patch. It wraps at 40 and also divides the indent level by half the std indent level. I tried a few different combinations and this is the one that produced the output I liked best. I attached the output for the pg_seclabels view (the only one the regression tests noticed changed) and a simple view consisting of many nested unions. $ git diff | filterdiff --format=context | tee /tmp/wrap-psql-indent-level.patch *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *************** *** 6398,6405 **** appendContextKeyword(deparse_context *context, const char *str, removeStringInfoSpaces(buf); /* ... then add a newline and some spaces */ appendStringInfoChar(buf, '\n'); ! appendStringInfoSpaces(buf, ! Max(context->indentLevel, 0) + indentPlus); appendStringInfoString(buf, str); --- 6398,6419 ---- removeStringInfoSpaces(buf); /* ... then add a newline and some spaces */ appendStringInfoChar(buf, '\n'); ! ! if (context->indentLevel <= 40) ! appendStringInfoSpaces(buf, ! Max(context->indentLevel, 0) + indentPlus); ! ! else ! { ! /* If we're indented > 40 characters try to conserve horizontal ! * space. Specifically it's important that the indentation not ! * grow unboundedly or else the size of some queries is ! * O(n^2). Wrapping modulo 40 guarantees at most a linear blowup ! * in size. */ ! unsigned wrapped_indent = context->indentLevel + indentPlus; ! wrapped_indent = (wrapped_indent / (PRETTYINDENT_STD/2)) % 40; ! appendStringInfoSpaces(buf, wrapped_indent); ! } appendStringInfoString(buf, str);
Вложения
Greg Stark <stark@mit.edu> writes: > I propose the attached patch. It wraps at 40 and also divides the > indent level by half the std indent level. I tried a few different > combinations and this is the one that produced the output I liked > best. I doubt you can do that (the half-size-step bit), at least not without a much larger patch than this: there are assorted places that just unconditionally append PRETTYINDENT_STD spaces, and would have to be taught to do something different. OTOH those places might need to be adjusted anyway. regards, tom lane
On Tue, Apr 29, 2014 at 7:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I doubt you can do that (the half-size-step bit), at least not without > a much larger patch than this: there are assorted places that just > unconditionally append PRETTYINDENT_STD spaces, and would have to be > taught to do something different. OTOH those places might need to be > adjusted anyway. As far as I can see this is the only place that needs to be adjusted. That function handles pretty much all the indentation. The only other places that insert spaces just insert a fixed number in strings like CREATE FUNCTION before the LANGUAGE or CREATE RULE before the ON. Actually the only thing that might want to be adjusted is the indentation in the beginning of the setop (ruleutils.c:4720) which is what causes that long line of parentheses at the beginning of the example. I suppose in an ideal world it would start following the reduced spacing and wrap to new lines whenever the indentation goes back to the left. But I can't get too excited by it in the example and I'm not sure it's even intended to line up anyways. It just inserts STD spaces without a newline. -- greg
Greg Stark <stark@mit.edu> writes: > Actually the only thing that might want to be adjusted is the > indentation in the beginning of the setop (ruleutils.c:4720) which is > what causes that long line of parentheses at the beginning of the > example. I suppose in an ideal world it would start following the > reduced spacing and wrap to new lines whenever the indentation goes > back to the left. But I can't get too excited by it in the example and > I'm not sure it's even intended to line up anyways. It just inserts > STD spaces without a newline. Actually, the patch I posted a little bit ago gets rid of that. However, I'm still dubious about halving the step distance, because there are assorted places that adjust the indentation of specific keywords by distances that aren't a multiple of 2 (look for odd last arguments to appendContextKeyword). I'm not sure how that will look after we make such a change. Possibly it would work to only apply the scale factor to context->indentLevel, and not the indentPlus number? regards, tom lane
I wrote: > I'm still dubious about halving the step distance, because there are > assorted places that adjust the indentation of specific keywords by > distances that aren't a multiple of 2 (look for odd last arguments to > appendContextKeyword). I'm not sure how that will look after we make such > a change. Possibly it would work to only apply the scale factor to > context->indentLevel, and not the indentPlus number? I pushed a patch that does it that way, and also patches for the other items discussed in this thread. regards, tom lane
On Wed, Apr 30, 2014 at 6:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I pushed a patch that does it that way, and also patches for the other > items discussed in this thread. Great! thanks a lot. This makes a really solid noticeable difference even in relatively simple cases and I know of users for whom this will make a big difference. That indentation code looks like a lot of work to maintain that I hadn't really been aware of before. I wonder if there's some way to combine it with the actual grammar so the input and output can be edited in the same place. -- greg
Hi all, Now that we're on the topic of view deparsing, what are your thoughts on making this less painful? local:marko=#* create view foov as select exists(select * from foo); CREATE VIEW local:marko=#* \d+ foov View "public.foov" Column | Type | Modifiers | Storage | Description --------+---------+-----------+---------+------------- exists | boolean | | plain | View definition: SELECT (EXISTS ( SELECT foo.way, foo.too, foo.many, foo.columns, foo.here FROM foo)) AS "exists"; I've switched to using SELECT 1 in EXISTS for this reason, but perhaps other people haven't yet done that.. Regards, Marko Tiikkaja
On 05/03/2014 09:17 AM, Marko Tiikkaja wrote: > Hi all, > > Now that we're on the topic of view deparsing, what are your thoughts > on making this less painful? > > local:marko=#* create view foov as select exists(select * from foo); > CREATE VIEW > local:marko=#* \d+ foov > View "public.foov" > Column | Type | Modifiers | Storage | Description > --------+---------+-----------+---------+------------- > exists | boolean | | plain | > View definition: > SELECT (EXISTS ( SELECT foo.way, > foo.too, > foo.many, > foo.columns, > foo.here > FROM foo)) AS "exists"; > > > I've switched to using SELECT 1 in EXISTS for this reason, but perhaps > other people haven't yet done that.. > > > I've done that for quite a few years. I think it's better style than using *. cheers andrew