Обсуждение: pg_get_viewdefs() indentation considered harmful

Поиск
Список
Период
Сортировка

pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
We're finding it more and more common for people to define partitioned
table views with hundreds or thousands of union branches.
pg_get_viewdefs indents each branch of the union by 8 spaces more than
the previous branch. The net effect is that the size of the output is
O(n^2) bytes just for the indentation spaces. If your union branches
are about 5 lines long then you hit MaxAlloc after about 5,000
branches. And incidentally there's no CHECK_FOR_INTERRUPTS in that
loop.

I would actually suggest pg_dump should be able to pass a flag to
disable indentation but then I noticed that all the code is already
there. It's just that every single variation of pg_get_viewdef sets
the indentation flag explicitly. I wonder if this was the intended
behaviour. Shouldn't pg_get_viewdef(view, false) set indentation off?

-- 
greg



Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> We're finding it more and more common for people to define partitioned
> table views with hundreds or thousands of union branches.

Really?  Given how poorly the system performs with that many inheritance
children, I've got a hard time believing either that this is common or
that ruleutils is your worst problem with it.

> pg_get_viewdefs indents each branch of the union by 8 spaces more than
> the previous branch.

I think that's because the unions are a nested binary tree so far as the
parsetree representation goes.  We could probably teach ruleutils to
flatten the display in common cases, but it might be a bit tricky to be
sure we don't create any lies about unusual nestings.
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> pg_get_viewdefs indents each branch of the union by 8 spaces more than
>> the previous branch.
>
> I think that's because the unions are a nested binary tree so far as the
> parsetree representation goes.  We could probably teach ruleutils to
> flatten the display in common cases, but it might be a bit tricky to be
> sure we don't create any lies about unusual nestings.

That may be a worthwhile thing to do.

But it strikes me that pg_dump, at least when not doing an SQL dump,
really has no reason to ask for indentation at all. It's already
asking for non-prettyprinted output, why not make non-prettyprinted
also mean non-indented?

Also, it also seems like a reasonable safeguard that if the underlying
rule is larger than, say, 32kB or maybe 128kB you probably don't care
about indentation. That would cover all the system views except a
handful that seem to have peculiarly large pg_rewrite rules
considering their SQL definition isn't especially large:

daeqck898dvduj=> select ev_class::regclass, length(ev_action)
rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len,
length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from
pg_rewrite  order by 2 desc limit 20;                 ev_class                  | rewrite_len |
prettyprint_len | non_prettyprint_len
--------------------------------------------+-------------+-----------------+---------------------pg_seclabels
                    |      138565 |
 
13528 |               13758information_schema.columns                 |      126787 |
6401 |                6642information_schema.usage_privileges        |       93863 |
11653 |               11939information_schema.attributes              |       83268 |
4264 |                4417information_schema.referential_constraints |       81762 |
2527 |                2653pg_statio_all_tables                       |       59617 |
1023 |                1061pg_stats                                   |       59331 |
2803 |                2899information_schema.domains                 |       54611 |
3206 |                3320information_schema.element_types           |       53049 |
5608 |                5762information_schema.routines                |       52333 |
7852 |                8089information_schema.column_privileges       |       49229 |
3883 |                3954pg_indexes                                 |       46717 |458 |
486information_schema.check_constraints      |       42132 |
 
1375 |                1443information_schema.tables                  |       37458 |
2122 |                2212pg_stat_all_indexes                        |       35426 |508 |
528pg_statio_all_indexes                     |       35412 |490 |
512information_schema.table_constraints      |       31882 |
 
3098 |                3231information_schema.column_udt_usage        |       31731 |
1034 |                1090information_schema.parameters              |       30497 |
3640 |                3750pg_stat_all_tables                         |       27193 |
1367 |                1387
(20 rows)



-- 
greg



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
Argh. Attached is a plain text file with that query data. I'll be
switching back to Gnus any day now.

Вложения

Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> But it strikes me that pg_dump, at least when not doing an SQL dump,
> really has no reason to ask for indentation at all. It's already
> asking for non-prettyprinted output, why not make non-prettyprinted
> also mean non-indented?

We do go to some effort to make pg_dump's output legible, so I'm not
especially on board with this idea.
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Robert Haas
Дата:
On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> We're finding it more and more common for people to define partitioned
>> table views with hundreds or thousands of union branches.
>
> Really?  Given how poorly the system performs with that many inheritance
> children, I've got a hard time believing either that this is common or
> that ruleutils is your worst problem with it.

Doesn't make it a bad idea to fix it.  You may need hundreds or
thousands of union branches for this to totally break the world, but
you only need about five for it to be annoying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Greg Stark <stark@mit.edu> writes:
>>> We're finding it more and more common for people to define partitioned
>>> table views with hundreds or thousands of union branches.
>>
>> Really?  Given how poorly the system performs with that many inheritance
>> children, I've got a hard time believing either that this is common or
>> that ruleutils is your worst problem with it.
>
> Doesn't make it a bad idea to fix it.  You may need hundreds or
> thousands of union branches for this to totally break the world, but
> you only need about five for it to be annoying.

Indeed even aside from the performance questions, once you're indented
5-10 times the indention stops being useful at all. The query would
probably be even more readable if we just made indentation modulo 40
so once you get too far indented it "wraps around" which is not unlike
how humans actually indent things in this case.


-- 
greg



Re: pg_get_viewdefs() indentation considered harmful

От
Robert Haas
Дата:
On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Greg Stark <stark@mit.edu> writes:
>>>> We're finding it more and more common for people to define partitioned
>>>> table views with hundreds or thousands of union branches.
>>>
>>> Really?  Given how poorly the system performs with that many inheritance
>>> children, I've got a hard time believing either that this is common or
>>> that ruleutils is your worst problem with it.
>>
>> Doesn't make it a bad idea to fix it.  You may need hundreds or
>> thousands of union branches for this to totally break the world, but
>> you only need about five for it to be annoying.
>
> Indeed even aside from the performance questions, once you're indented
> 5-10 times the indention stops being useful at all. The query would
> probably be even more readable if we just made indentation modulo 40
> so once you get too far indented it "wraps around" which is not unlike
> how humans actually indent things in this case.

Ha!  That seems a little crazy, but *capping* the indentation at some
reasonable value might not be dumb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote:
>> Indeed even aside from the performance questions, once you're indented
>> 5-10 times the indention stops being useful at all. The query would
>> probably be even more readable if we just made indentation modulo 40
>> so once you get too far indented it "wraps around" which is not unlike
>> how humans actually indent things in this case.

> Ha!  That seems a little crazy, but *capping* the indentation at some
> reasonable value might not be dumb.

I could go for either of those approaches, if applied uniformly, and
actually Greg's suggestion sounds a bit better: it seems more likely
to preserve some readability in deeply nested constructs.

With either approach you need to ask where the limit value is going
to come from.  Is it OK to just hard-wire a magic number, or do we
need to expose a knob somewhere?
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Andrew Dunstan
Дата:
On 01/25/2014 11:06 AM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote:
>>> Indeed even aside from the performance questions, once you're indented
>>> 5-10 times the indention stops being useful at all. The query would
>>> probably be even more readable if we just made indentation modulo 40
>>> so once you get too far indented it "wraps around" which is not unlike
>>> how humans actually indent things in this case.
>> Ha!  That seems a little crazy, but *capping* the indentation at some
>> reasonable value might not be dumb.
> I could go for either of those approaches, if applied uniformly, and
> actually Greg's suggestion sounds a bit better: it seems more likely
> to preserve some readability in deeply nested constructs.
>
> With either approach you need to ask where the limit value is going
> to come from.  Is it OK to just hard-wire a magic number, or do we
> need to expose a knob somewhere?
>
>             


Simply capping it is probably the best bang for the buck. I suspect most 
people would prefer to have  "q1 union q2 union q3 union q4" with the 
subqueries all indented to the same level. But I understand the 
difficulties in doing so.

A knob seems like overkill. I'd just hardwire some number, say three or 
four levels of indentation.

cheers

andrew



Re: pg_get_viewdefs() indentation considered harmful

От
Bruce Momjian
Дата:
On Sat, Jan 25, 2014 at 01:02:36PM -0500, Andrew Dunstan wrote:
> 
> On 01/25/2014 11:06 AM, Tom Lane wrote:
> >Robert Haas <robertmhaas@gmail.com> writes:
> >>On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark <stark@mit.edu> wrote:
> >>>Indeed even aside from the performance questions, once you're indented
> >>>5-10 times the indention stops being useful at all. The query would
> >>>probably be even more readable if we just made indentation modulo 40
> >>>so once you get too far indented it "wraps around" which is not unlike
> >>>how humans actually indent things in this case.
> >>Ha!  That seems a little crazy, but *capping* the indentation at some
> >>reasonable value might not be dumb.
> >I could go for either of those approaches, if applied uniformly, and
> >actually Greg's suggestion sounds a bit better: it seems more likely
> >to preserve some readability in deeply nested constructs.
> >
> >With either approach you need to ask where the limit value is going
> >to come from.  Is it OK to just hard-wire a magic number, or do we
> >need to expose a knob somewhere?
> >
> >            
> 
> 
> Simply capping it is probably the best bang for the buck. I suspect
> most people would prefer to have  "q1 union q2 union q3 union q4"
> with the subqueries all indented to the same level. But I understand
> the difficulties in doing so.
> 
> A knob seems like overkill. I'd just hardwire some number, say three
> or four levels of indentation.

Did we address this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
I propose the attached patch. It wraps at 40 and also divides the
indent level by half the std indent level. I tried a few different
combinations and this is the one that produced the output I liked
best. I attached the output for the pg_seclabels view (the only one
the regression tests noticed changed) and a simple view consisting of
many nested unions.

$ git diff | filterdiff --format=context | tee /tmp/wrap-psql-indent-level.patch
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***************
*** 6398,6405 **** appendContextKeyword(deparse_context *context,
const char *str,
  removeStringInfoSpaces(buf);
  /* ... then add a newline and some spaces */
  appendStringInfoChar(buf, '\n');
! appendStringInfoSpaces(buf,
!   Max(context->indentLevel, 0) + indentPlus);

  appendStringInfoString(buf, str);

--- 6398,6419 ----
  removeStringInfoSpaces(buf);
  /* ... then add a newline and some spaces */
  appendStringInfoChar(buf, '\n');
!
! if (context->indentLevel <= 40)
! appendStringInfoSpaces(buf,
!   Max(context->indentLevel, 0) + indentPlus);
!
! else
! {
! /* If we're indented > 40 characters try to conserve horizontal
! * space. Specifically it's important that the indentation not
! * grow unboundedly or else the size of some queries is
! * O(n^2). Wrapping modulo 40 guarantees at most a linear blowup
! * in size. */
! unsigned wrapped_indent = context->indentLevel + indentPlus;
! wrapped_indent = (wrapped_indent / (PRETTYINDENT_STD/2)) % 40;
! appendStringInfoSpaces(buf, wrapped_indent);
! }

  appendStringInfoString(buf, str);

Вложения

Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> I propose the attached patch. It wraps at 40 and also divides the
> indent level by half the std indent level. I tried a few different
> combinations and this is the one that produced the output I liked
> best.

I doubt you can do that (the half-size-step bit), at least not without
a much larger patch than this: there are assorted places that just
unconditionally append PRETTYINDENT_STD spaces, and would have to be
taught to do something different.  OTOH those places might need to be
adjusted anyway.
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
On Tue, Apr 29, 2014 at 7:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I doubt you can do that (the half-size-step bit), at least not without
> a much larger patch than this: there are assorted places that just
> unconditionally append PRETTYINDENT_STD spaces, and would have to be
> taught to do something different.  OTOH those places might need to be
> adjusted anyway.

As far as I can see this is the only place that needs to be adjusted.
That function handles pretty much all the indentation. The only other
places that insert spaces just insert a fixed number in strings like
CREATE FUNCTION before the LANGUAGE or CREATE RULE before the ON.

Actually the only thing that might want to be adjusted is the
indentation in the beginning of the setop (ruleutils.c:4720) which is
what causes that long line of parentheses at the beginning of the
example. I suppose in an ideal world it would start following the
reduced spacing and wrap to new lines whenever the indentation goes
back to the left. But I can't get too excited by it in the example and
I'm not sure it's even intended to line up anyways. It just inserts
STD spaces without a newline.




-- 
greg



Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> Actually the only thing that might want to be adjusted is the
> indentation in the beginning of the setop (ruleutils.c:4720) which is
> what causes that long line of parentheses at the beginning of the
> example. I suppose in an ideal world it would start following the
> reduced spacing and wrap to new lines whenever the indentation goes
> back to the left. But I can't get too excited by it in the example and
> I'm not sure it's even intended to line up anyways. It just inserts
> STD spaces without a newline.

Actually, the patch I posted a little bit ago gets rid of that.  However,
I'm still dubious about halving the step distance, because there are
assorted places that adjust the indentation of specific keywords by
distances that aren't a multiple of 2 (look for odd last arguments to
appendContextKeyword).  I'm not sure how that will look after we make such
a change.  Possibly it would work to only apply the scale factor to
context->indentLevel, and not the indentPlus number?
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Tom Lane
Дата:
I wrote:
> I'm still dubious about halving the step distance, because there are
> assorted places that adjust the indentation of specific keywords by
> distances that aren't a multiple of 2 (look for odd last arguments to
> appendContextKeyword).  I'm not sure how that will look after we make such
> a change.  Possibly it would work to only apply the scale factor to
> context->indentLevel, and not the indentPlus number?

I pushed a patch that does it that way, and also patches for the other
items discussed in this thread.
        regards, tom lane



Re: pg_get_viewdefs() indentation considered harmful

От
Greg Stark
Дата:
On Wed, Apr 30, 2014 at 6:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I pushed a patch that does it that way, and also patches for the other
> items discussed in this thread.

Great! thanks a lot. This makes a really solid noticeable difference
even in relatively simple cases and I know of users for whom this will
make a big difference.

That indentation code looks like a lot of work to maintain that I
hadn't really been aware of before. I wonder if there's some way to
combine it with the actual grammar so the input and output can be
edited in the same place.

-- 
greg



Re: pg_get_viewdefs() indentation considered harmful

От
Marko Tiikkaja
Дата:
Hi all,

Now that we're on the topic of view deparsing, what are your thoughts on 
making this less painful?

local:marko=#* create view foov as select exists(select * from foo);
CREATE VIEW
local:marko=#* \d+ foov                  View "public.foov" Column |  Type   | Modifiers | Storage | Description
--------+---------+-----------+---------+------------- exists | boolean |           | plain   |
View definition: SELECT (EXISTS ( SELECT foo.way,            foo.too,            foo.many,            foo.columns,
     foo.here           FROM foo)) AS "exists";
 


I've switched to using SELECT 1 in EXISTS for this reason, but perhaps 
other people haven't yet done that..


Regards,
Marko Tiikkaja



Re: pg_get_viewdefs() indentation considered harmful

От
Andrew Dunstan
Дата:
On 05/03/2014 09:17 AM, Marko Tiikkaja wrote:
> Hi all,
>
> Now that we're on the topic of view deparsing, what are your thoughts 
> on making this less painful?
>
> local:marko=#* create view foov as select exists(select * from foo);
> CREATE VIEW
> local:marko=#* \d+ foov
>                   View "public.foov"
>  Column |  Type   | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
>  exists | boolean |           | plain   |
> View definition:
>  SELECT (EXISTS ( SELECT foo.way,
>             foo.too,
>             foo.many,
>             foo.columns,
>             foo.here
>            FROM foo)) AS "exists";
>
>
> I've switched to using SELECT 1 in EXISTS for this reason, but perhaps 
> other people haven't yet done that..
>
>
>


I've done that for quite a few years. I think it's better style than 
using *.

cheers

andrew