Обсуждение: POC: converting Lists into arrays
For quite some years now there's been dissatisfaction with our List
data structure implementation.  Because it separately palloc's each
list cell, it chews up lots of memory, and it's none too cache-friendly
because the cells aren't necessarily adjacent.  Moreover, our typical
usage is to just build a list by repeated lappend's and never modify it,
so that the flexibility of having separately insertable/removable list
cells is usually wasted.
Every time this has come up, I've opined that the right fix is to jack
up the List API and drive a new implementation underneath, as we did
once before (cf commit d0b4399d81).  I thought maybe it was about time
to provide some evidence for that position, so attached is a POC patch
that changes Lists into expansible arrays, while preserving most of
their existing API.
The big-picture performance change is that this makes list_nth()
a cheap O(1) operation, while lappend() is still pretty cheap;
on the downside, lcons() becomes O(N), as does insertion or deletion
in the middle of a list.  But we don't use lcons() very much
(and maybe a lot of the existing usages aren't really necessary?),
while insertion/deletion in long lists is a vanishingly infrequent
operation.  Meanwhile, making list_nth() cheap is a *huge* win.
The most critical thing that we lose by doing this is that when a
List is modified, all of its cells may need to move, which breaks
a lot of code that assumes it can insert or delete a cell while
hanging onto a pointer to a nearby cell.  In almost every case,
this takes the form of doing list insertions or deletions inside
a foreach() loop, and the pointer that's invalidated is the loop's
current-cell pointer.  Fortunately, with list_nth() now being so cheap,
we can replace these uses of foreach() with loops using an integer
index variable and fetching the next list element directly with
list_nth().  Most of these places were loops around list_delete_cell
calls, which I replaced with a new function list_delete_nth_cell
to go along with the emphasis on the integer index.
I don't claim to have found every case where that could happen,
although I added debug support in list.c to force list contents
to move on every list modification, and this patch does pass
check-world with that support turned on.  I fear that some such
bugs remain, though.
There is one big way in which I failed to preserve the old API
syntactically: lnext() now requires a pointer to the List as
well as the current ListCell, so that it can figure out where
the end of the cell array is.  That requires touching something
like 150 places that otherwise wouldn't have had to be touched,
which is annoying, even though most of those changes are trivial.
I thought about avoiding that by requiring Lists to keep a "sentinel"
value in the cell after the end of the active array, so that lnext()
could look for the sentinel to detect list end.  However, that idea
doesn't really work, because if the list array has been moved, the
spot where the sentinel had been could have been reallocated and
filled with something else.  So this'd offer no defense against the
possibility of a stale ListCell pointer, which is something that
we absolutely need defenses for.  As the patch stands we can have
quite a strong defense, because we can check whether the presented
ListCell pointer actually points into the list's current data array.
Another annoying consequence of lnext() needing a List pointer is that
the List arguments of foreach() and related macros are now evaluated
each time through the loop.  I could only find half a dozen places
where that was actually unsafe (all fixed in the draft patch), but
it's still bothersome.  I experimented with ways to avoid that, but
the only way I could get it to work was to define foreach like this:
#define foreach(cell, l)        for (const List *cell##__foreach = foreach_setup(l, &cell);          cell != NULL; cell
=lnext(cell##__foreach, cell)) 
static inline const List *
foreach_setup(const List *l, ListCell **cell)
{
    *cell = list_head(l);
    return l;
}
That works, but there are two problems.  The lesser one is that a
not-very-bright compiler might think that the "cell" variable has to
be forced into memory, because its address is taken.  The bigger one is
that this coding forces the "cell" variable to be exactly "ListCell *";
you can't add const or volatile qualifiers to it without getting
compiler warnings.  There are actually more places that'd be affected
by that than by the need to avoid multiple evaluations.  I don't think
the const markings would be a big deal to lose, and the two places in
do_autovacuum that need "volatile" (because of a nearby PG_TRY) could
be rewritten to not use foreach.  So I'm tempted to do that, but it's
not very pretty.  Maybe somebody can think of a better solution?
There's a lot of potential follow-on work that I've not touched yet:
1. This still involves at least two palloc's for every nonempty List,
because I kept the header and the data array separate.  Perhaps it's
worth allocating those in one palloc.  However, right now there's an
assumption that the header of a nonempty List doesn't move when you
change its contents; that's embedded in the API of the lappend_cell
functions, and more than likely there's code that depends on that
silently because it doesn't bother to store the results of other
List functions back into the appropriate pointer.  So if we do that
at all I think it'd be better tackled in a separate patch; and I'm
not really convinced it's worth the trouble and extra risk of bugs.
2. list_qsort() is now absolutely stupidly defined.  It should just
qsort the list's data array in-place.  But that requires an API
break for the caller-supplied comparator, since there'd be one less
level of indirection.  I think we should change this, but again it'd
be better done as an independent patch to make it more visible in the
git history.
3. There are a number of places where we've built flat arrays
paralleling Lists, such as the planner's simple_rte_array.  That's
pointless now and could be undone, buying some code simplicity.
Various other micro-optimizations could doubtless be done too;
I've not looked hard.
I haven't attempted any performance measurements on this, but at
least in principle it should speed things up a bit, especially
for complex test cases involving longer Lists.  I don't have any
very suitable test cases at hand, anyway.
I think this is too late for v12, both because of the size of the
patch and because of the likelihood that it introduces a few bugs.
I'd like to consider pushing it early in the v13 cycle, though.
            regards, tom lane
			
		Вложения
Hello Tom,
> For quite some years now there's been dissatisfaction with our List
> data structure implementation.  Because it separately palloc's each
> list cell, it chews up lots of memory, and it's none too cache-friendly
> because the cells aren't necessarily adjacent.  Moreover, our typical
> usage is to just build a list by repeated lappend's and never modify it,
> so that the flexibility of having separately insertable/removable list
> cells is usually wasted.
>
> Every time this has come up, I've opined that the right fix is to jack
> up the List API and drive a new implementation underneath, as we did
> once before (cf commit d0b4399d81).  I thought maybe it was about time
> to provide some evidence for that position, so attached is a POC patch
> that changes Lists into expansible arrays, while preserving most of
> their existing API.
My 0.02€ about this discussion (I assume that it is what you want): I had 
the same issue in the past on a research project. I used a similar but 
slightly different approach:
I did not touch the existing linked list implementation but provided 
another data structure, which was a linked list of buckets (small arrays) 
stack kept from the head, with buckets allocated on need but not freed 
until the final deallocation. If pop was used extensively, a linked list 
of freed bucket was kept, so that they could be reused. Some expensive 
list-like functions were not provided, so the data structure could replace 
lists in some but not all instances, which was fine. The dev had then to 
choose which data structure was best for its use case, and critical 
performance cases could be replaced.
Note that a "foreach", can be done reasonably cleanly with a 
stack-allocated iterator & c99 struct initialization syntax, which is now 
allowed in pg AFAICR, something like:
   typedef struct { ... } stack_iterator;
   #define foreach_stack(i, s) \
     for (stack_iterator i = SITER_INIT(s); SITER_GO_ON(&i); SITER_NEXT(&i))
Used with a simple pattern:
   foreach_stack(i, s)
   {
     item_type = GET_ITEM(i);
     ...
   }
This approach is not as transparent as your approach, but changes are 
somehow less extensive, and it provides choices instead of trying to do a 
one solution must fit all use cases. Also, it allows to revisit the 
pointer to reference choices on some functions with limited impact.
In particular the data structure is used for a "string buffer" 
implementation (like the PQExpBuffer stuff).
-- 
Fabien.
			
		On Sat, Feb 23, 2019 at 9:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Every time this has come up, I've opined that the right fix is to jack > up the List API and drive a new implementation underneath, as we did > once before (cf commit d0b4399d81). I thought maybe it was about time > to provide some evidence for that position, so attached is a POC patch > that changes Lists into expansible arrays, while preserving most of > their existing API. I'm not really convinced that this is the way to go. The thing is, any third-party code people have that uses a List may simply break. If you kept the existing List and changed a bunch of existing code to use a new Vector implementation, or Thomas's SimpleVector stuff, then that wouldn't happen. The reason why people - or at least me - have been reluctant to accept that you can just jack up the API and drive a new implementation underneath is that the new implementation will involve breaking guarantees on which existing code relies; indeed, your email makes it pretty clear that this is the case. If you could replace the existing implementation without breaking any code, that would be a no-brainer but there's no real way to do that and get the performance benefits you're seeking to obtain. It is also perhaps worth mentioning that reimplementing a List as an array means that it is... not a list. That is a pretty odd state of affairs, and to me is another sign that we want to leave the existing thing alone and convert some/most/all core code to use a new thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Feb 23, 2019 at 9:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Every time this has come up, I've opined that the right fix is to jack
>> up the List API and drive a new implementation underneath, as we did
>> once before (cf commit d0b4399d81).
> I'm not really convinced that this is the way to go.  The thing is,
> any third-party code people have that uses a List may simply break.
> If you kept the existing List and changed a bunch of existing code to
> use a new Vector implementation, or Thomas's SimpleVector stuff, then
> that wouldn't happen.
I'm not following your point here.  If we change key data structures
(i.e. parsetrees, plan trees, execution trees) to use some other list-ish
API, that *in itself* breaks everything that accesses those data
structures.  The approach I propose here isn't zero-breakage, but it
requires far fewer places to be touched than a complete API replacement
would do.
Just as with the dlist/slist stuff, inventing a new list API might be
acceptable for all-new data structures that didn't exist before, but
it isn't going to really help for code and data structures that've been
around for decades.
> If you could
> replace the existing implementation without breaking any code, that
> would be a no-brainer but there's no real way to do that and get the
> performance benefits you're seeking to obtain.
Yup.  So are you saying that we'll never redesign parsetrees again?
We break things regularly, as long as the cost/benefit justifies it.
> It is also perhaps worth mentioning that reimplementing a List as an
> array means that it is... not a list.  That is a pretty odd state of
> affairs, and to me is another sign that we want to leave the existing
> thing alone and convert some/most/all core code to use a new thing.
I completely disagree.  Your proposal is probably an order of magnitude
more painful than the approach I suggest here, while not really offering
any additional performance benefit (or if you think there would be some,
you haven't explained how).  Strictly on cost/benefit grounds, it isn't
ever going to happen that way.
            regards, tom lane
			
		On Mon, Feb 25, 2019 at 1:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not following your point here. If we change key data structures > (i.e. parsetrees, plan trees, execution trees) to use some other list-ish > API, that *in itself* breaks everything that accesses those data > structures. The approach I propose here isn't zero-breakage, but it > requires far fewer places to be touched than a complete API replacement > would do. Sure, but if you have third-party code that touches those things, it'll fail to compile. With your proposed approach, there seems to be a risk that it will compile but not work. > Yup. So are you saying that we'll never redesign parsetrees again? > We break things regularly, as long as the cost/benefit justifies it. I'm mostly objecting to the degree that the breakage is *silent*. > I completely disagree. Your proposal is probably an order of magnitude > more painful than the approach I suggest here, while not really offering > any additional performance benefit (or if you think there would be some, > you haven't explained how). Strictly on cost/benefit grounds, it isn't > ever going to happen that way. Why would it be ten times more painful, exactly? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 25, 2019 at 1:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not following your point here.  If we change key data structures
>> (i.e. parsetrees, plan trees, execution trees) to use some other list-ish
>> API, that *in itself* breaks everything that accesses those data
>> structures.  The approach I propose here isn't zero-breakage, but it
>> requires far fewer places to be touched than a complete API replacement
>> would do.
> Sure, but if you have third-party code that touches those things,
> it'll fail to compile.  With your proposed approach, there seems to be
> a risk that it will compile but not work.
Failing to compile isn't really a benefit IMO.  Now, if we could avoid
the *semantic* differences (like whether it's safe to hold onto a pointer
into a List while doing FOO on the list), then we'd have something.
The biggest problem with what I'm proposing is that it doesn't always
manage to do that --- but any other implementation is going to break
such assumptions too.  I do not think that forcing cosmetic changes
on people is going to do much to help them revisit possibly-hidden
assumptions like those.  What will help is to provide debugging aids to
flush out such assumptions, which I've endeavored to do in this patch.
And I would say that any competing proposal is going to be a failure
unless it provides at-least-as-effective support for flushing out bugs
in naive updates of existing List-using code.
>> I completely disagree.  Your proposal is probably an order of magnitude
>> more painful than the approach I suggest here, while not really offering
>> any additional performance benefit (or if you think there would be some,
>> you haven't explained how).  Strictly on cost/benefit grounds, it isn't
>> ever going to happen that way.
> Why would it be ten times more painful, exactly?
Because it involves touching ten times more code (and that's a very
conservative estimate).  Excluding changes in pg_list.h + list.c,
what I posted touches approximately 600 lines of code (520 insertions,
644 deletions to be exact).  For comparison's sake, there are about
1800 uses of foreach in the tree, each of which would require at least
3 changes to replace (the foreach itself, the ListCell variable
declaration, and at least one lfirst() reference in the loop body).
So we've already blown past 5000 lines worth of changes if we want to
do it another way ... and that's just *one* component of the List API.
Nor is there any reason to think the changes would be any more mechanical
than what I had to do here.  (No fair saying that I already found the
trouble spots, either.  A different implementation would likely break
assumptions in different ways.)
If I said your proposal involved two orders of magnitude more work,
I might not be far off the mark.
            regards, tom lane
			
		On Mon, Feb 25, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Because it involves touching ten times more code (and that's a very > conservative estimate). Excluding changes in pg_list.h + list.c, > what I posted touches approximately 600 lines of code (520 insertions, > 644 deletions to be exact). For comparison's sake, there are about > 1800 uses of foreach in the tree, each of which would require at least > 3 changes to replace (the foreach itself, the ListCell variable > declaration, and at least one lfirst() reference in the loop body). If we knew that the list code was the bottleneck in a handful of cases, then I'd come down on Robert's side here. It would then be possible to update the relevant bottlenecked code in an isolated fashion, while getting the lion's share of the benefit. However, I don't think that that's actually possible. The costs of using Lists everywhere are real and measurable, but it's also highly distributed. At least, that's my recollection from previous discussion from several years back. I remember talking about this with Andres in early 2016. > So we've already blown past 5000 lines worth of changes if we want to > do it another way ... and that's just *one* component of the List API. If you want to stop third party code from compiling, you can find a way to do that without really changing your approach. Nothing stops you from changing some symbol names minimally, and then making corresponding mechanical changes to all of the client code within the tree. Third party code authors would then follow this example, with the expectation that it's probably going to be a totally mechanical process. I'm not necessarily advocating that approach. I'm simply pointing out that a compromise is quite possible. > Nor is there any reason to think the changes would be any more mechanical > than what I had to do here. (No fair saying that I already found the > trouble spots, either. A different implementation would likely break > assumptions in different ways.) The idea of making a new vector/array implementation that is a more or less drop in replacement for List seems okay to me. C++ has both a std::list and a std::vector, and they support almost the same interface. Obviously the situation is different here, since you're retrofitting a new implementation with different performance characteristics, rather than implementing both in a green field situation. But it's not that different. -- Peter Geoghegan
Hi, On 2019-02-25 13:02:03 -0500, Robert Haas wrote: > On Sat, Feb 23, 2019 at 9:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Every time this has come up, I've opined that the right fix is to jack > > up the List API and drive a new implementation underneath, as we did > > once before (cf commit d0b4399d81). I thought maybe it was about time > > to provide some evidence for that position, so attached is a POC patch > > that changes Lists into expansible arrays, while preserving most of > > their existing API. > > I'm not really convinced that this is the way to go. The thing is, > any third-party code people have that uses a List may simply break. > If you kept the existing List and changed a bunch of existing code to > use a new Vector implementation, or Thomas's SimpleVector stuff, then > that wouldn't happen. The reason why people - or at least me - have > been reluctant to accept that you can just jack up the API and drive a > new implementation underneath is that the new implementation will > involve breaking guarantees on which existing code relies; indeed, > your email makes it pretty clear that this is the case. If you could > replace the existing implementation without breaking any code, that > would be a no-brainer but there's no real way to do that and get the > performance benefits you're seeking to obtain. Yea, it'd be more convincing. I'm not convinced it'd be a no-brainer though. Unless you've been hacking PG for a fair bit, the pg_list.h APIs are very hard to understand / remember. Given this change essentially requires auditing all code that uses List, ISTM we'd be much better off also changing the API at the same time. Yes that'll mean there'll be vestigial uses nobody bothered to convert in extension etc, but that's not that bad. Greetings, Andres Freund
Peter Geoghegan <pg@bowt.ie> writes:
> If we knew that the list code was the bottleneck in a handful of
> cases, then I'd come down on Robert's side here. It would then be
> possible to update the relevant bottlenecked code in an isolated
> fashion, while getting the lion's share of the benefit. However, I
> don't think that that's actually possible. The costs of using Lists
> everywhere are real and measurable, but it's also highly distributed.
> At least, that's my recollection from previous discussion from several
> years back. I remember talking about this with Andres in early 2016.
Yeah, that's exactly the point.  If we could replace some small number
of places with a vector-ish data structure and get all/most of the win,
then that would be the way to go about it.  But I'm pretty sure that
we aren't going to make much of an improvement without wholesale
changes.  Nor is it really all that attractive to have some pieces of
the parse/plan/execution tree data structures using one kind of list
while other places use another.  If we're to attack this at all,
I think we should go for a wholesale replacement.
Another way of looking at this is that if we expected that extensions
had a lot of private Lists, unrelated to these core data structures,
it might be worth preserving the List implementation so as not to cause
problems for such usage.  But I doubt that that's the case; or that
any such private lists are more likely to be broken by these API changes
than the core usage is (600 changes in however many lines we've got is
not a lot); or that people would really want to deal with two independent
list implementations with different behaviors just to avoid revisiting
some portions of their code while they're being forced to revisit others
anyway.
> If you want to stop third party code from compiling, you can find a
> way to do that without really changing your approach. Nothing stops
> you from changing some symbol names minimally, and then making
> corresponding mechanical changes to all of the client code within the
> tree. Third party code authors would then follow this example, with
> the expectation that it's probably going to be a totally mechanical
> process.
Yeah, if we expected that only mechanical changes would be needed, and
forcing certain syntax changes would be a good guide to what had to be
done, then this'd be a helpful way to proceed.  The lnext changes in
my proposed patch do indeed work like that.  But the part that's actually
hard is finding/fixing the places where you can't safely use lnext
anymore, and there's nothing very mechanical about that.  (Unless you want
to just forbid lnext altogether, which maybe is a reasonable thing to
contemplate, but I judged it overkill.)
BTW, I neglected to respond to Robert's earlier point that
>>> It is also perhaps worth mentioning that reimplementing a List as an
>>> array means that it is... not a list.  That is a pretty odd state of
>>> affairs
I think the reason we have Lisp-style lists all over the place has little
to do with whether those are ideal data structures, and a lot to do with
the fact that chunks of Postgres were originally written in Lisp, and
in that language using lists for everything is just How It's Done.
I don't have any problem with regarding that nomenclature as being mostly
a legacy thing, which is how I documented it in the proposed revision
to pg_list.h's header comment.
            regards, tom lane
			
		Hi, On 2019-02-25 13:59:36 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Feb 25, 2019 at 1:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm not following your point here. If we change key data structures > >> (i.e. parsetrees, plan trees, execution trees) to use some other list-ish > >> API, that *in itself* breaks everything that accesses those data > >> structures. The approach I propose here isn't zero-breakage, but it > >> requires far fewer places to be touched than a complete API replacement > >> would do. > > > Sure, but if you have third-party code that touches those things, > > it'll fail to compile. With your proposed approach, there seems to be > > a risk that it will compile but not work. > > Failing to compile isn't really a benefit IMO. It's a huge benefit. It's a lot of effort to look through all source code for potential breakages. Especially if all list usages, rather than some planner detail that comparatively few extensions touch, needs to be audited. > >> I completely disagree. Your proposal is probably an order of magnitude > >> more painful than the approach I suggest here, while not really offering > >> any additional performance benefit (or if you think there would be some, > >> you haven't explained how). Strictly on cost/benefit grounds, it isn't > >> ever going to happen that way. > > > Why would it be ten times more painful, exactly? > > Because it involves touching ten times more code (and that's a very > conservative estimate). Excluding changes in pg_list.h + list.c, > what I posted touches approximately 600 lines of code (520 insertions, > 644 deletions to be exact). For comparison's sake, there are about > 1800 uses of foreach in the tree, each of which would require at least > 3 changes to replace (the foreach itself, the ListCell variable > declaration, and at least one lfirst() reference in the loop body). > So we've already blown past 5000 lines worth of changes if we want to > do it another way ... and that's just *one* component of the List API. > Nor is there any reason to think the changes would be any more mechanical > than what I had to do here. (No fair saying that I already found the > trouble spots, either. A different implementation would likely break > assumptions in different ways.) FWIW, rewrites of this kind can be quite nicely automated using coccinelle [1]. One sometimes needs to do a bit of mop-up with variable names, but otherwise it should be mostly complete. Greetings, Andres Freund [1] http://coccinelle.lip6.fr/
Andres Freund <andres@anarazel.de> writes:
> Yea, it'd be more convincing. I'm not convinced it'd be a no-brainer
> though. Unless you've been hacking PG for a fair bit, the pg_list.h APIs
> are very hard to understand / remember. Given this change essentially
> requires auditing all code that uses List, ISTM we'd be much better off
> also changing the API at the same time.  Yes that'll mean there'll be
> vestigial uses nobody bothered to convert in extension etc, but that's
> not that bad.
The pain factor for back-patching is alone a strong reason for not just
randomly replacing the List API with different spellings.
            regards, tom lane
			
		Andres Freund <andres@anarazel.de> writes:
> FWIW, rewrites of this kind can be quite nicely automated using
> coccinelle [1]. One sometimes needs to do a bit of mop-up with variable
> names, but otherwise it should be mostly complete.
I'm getting slightly annoyed by arguments that reject a live, workable
patch in favor of pie-in-the-sky proposals.  Both you and Robert seem
to be advocating solutions that don't exist and would take a very large
amount of work to create.  If you think differently, let's see a patch.
            regards, tom lane
			
		Hi, On 2019-02-25 11:59:06 -0800, Peter Geoghegan wrote: > On Mon, Feb 25, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Because it involves touching ten times more code (and that's a very > > conservative estimate). Excluding changes in pg_list.h + list.c, > > what I posted touches approximately 600 lines of code (520 insertions, > > 644 deletions to be exact). For comparison's sake, there are about > > 1800 uses of foreach in the tree, each of which would require at least > > 3 changes to replace (the foreach itself, the ListCell variable > > declaration, and at least one lfirst() reference in the loop body). > > If we knew that the list code was the bottleneck in a handful of > cases, then I'd come down on Robert's side here. It would then be > possible to update the relevant bottlenecked code in an isolated > fashion, while getting the lion's share of the benefit. However, I > don't think that that's actually possible. The costs of using Lists > everywhere are real and measurable, but it's also highly distributed. > At least, that's my recollection from previous discussion from several > years back. I remember talking about this with Andres in early 2016. It's distributed, but not *that* distributed. The largest source of "cost" at execution time used to be all-over expression evaluation, but that's gone now. That was a lot of places, but it's not outside of reach of a targeted change. Now it's targetlist handling, which'd have to change together with plan time, where it's a large issue. > > So we've already blown past 5000 lines worth of changes if we want to > > do it another way ... and that's just *one* component of the List API. > > If you want to stop third party code from compiling, you can find a > way to do that without really changing your approach. Nothing stops > you from changing some symbol names minimally, and then making > corresponding mechanical changes to all of the client code within the > tree. Third party code authors would then follow this example, with > the expectation that it's probably going to be a totally mechanical > process. > > I'm not necessarily advocating that approach. I'm simply pointing out > that a compromise is quite possible. That breaks extension code using lists unnecessarily though. And given that there's semantic change, I don't htink it's an entirely mechanical process. Greetings, Andres Freund
Hi, On 2019-02-25 16:03:43 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > FWIW, rewrites of this kind can be quite nicely automated using > > coccinelle [1]. One sometimes needs to do a bit of mop-up with variable > > names, but otherwise it should be mostly complete. > > I'm getting slightly annoyed by arguments that reject a live, workable > patch in favor of pie-in-the-sky proposals. Both you and Robert seem > to be advocating solutions that don't exist and would take a very large > amount of work to create. If you think differently, let's see a patch. Uhm, we're talking about an invasive proposal from two weekend days ago. It seems far from crazy to voice our concerns with the silent breakage you propose. Nor, even if we were obligated to work on an alternative approach, which we aren't, would it be realistic for us to have written an alternative implementation within the last few hours, while also working on our own priorities. I'm actually quite interested in this topic, both in the sense that it's great to see work, and in the sense that I'm willing to help with the effort. Greetings, Andres Freund
On Mon, Feb 25, 2019 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm getting slightly annoyed by arguments that reject a live, workable > patch in favor of pie-in-the-sky proposals. Both you and Robert seem > to be advocating solutions that don't exist and would take a very large > amount of work to create. If you think differently, let's see a patch. ISTM that we should separate the question of whether or not the List API needs to continue to work without needing to change code in third party extensions from the question of whether or not the List API needs to be replaced whole cloth. These are not exactly independent questions, but they don't necessarily need to be discussed all at once. Andres said that he doesn't like the pg_list.h API. It's not pretty, but is it really that bad? The List implementation claims to be generic, but it's not actually that generic. It has to work as a Node. It's not quite fair to say that it's confusing without acknowledging that pg_list.h is special to query processing. -- Peter Geoghegan
Hi, On 2019-02-25 13:21:30 -0800, Peter Geoghegan wrote: > ISTM that we should separate the question of whether or not the List > API needs to continue to work without needing to change code in third > party extensions from the question of whether or not the List API > needs to be replaced whole cloth. These are not exactly independent > questions, but they don't necessarily need to be discussed all at > once. I'm not convinced by that - if we are happy with the list API, not duplicating code would be a stronger argument than if we actually are unhappy. It makes no sense to go around and replace the same code twice in a row if we also think other changes should be made (at the same time, we obviously ought not to do too much at once, otherwise we'll never get anywhere). > Andres said that he doesn't like the pg_list.h API. It's not pretty, > but is it really that bad? Yes. The function names alone confound anybody new to postgres, we tend to forget that after a few years. A lot of the function return types are basically unpredictable without reading the code, the number of builtin types is pretty restrictive, and there's no typesafety around the choice of actually stored. Greetings, Andres Freund
On 2/25/19 10:03 PM, Andres Freund wrote: > Hi, > > On 2019-02-25 11:59:06 -0800, Peter Geoghegan wrote: >> On Mon, Feb 25, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Because it involves touching ten times more code (and that's a very >>> conservative estimate). Excluding changes in pg_list.h + list.c, >>> what I posted touches approximately 600 lines of code (520 insertions, >>> 644 deletions to be exact). For comparison's sake, there are about >>> 1800 uses of foreach in the tree, each of which would require at least >>> 3 changes to replace (the foreach itself, the ListCell variable >>> declaration, and at least one lfirst() reference in the loop body). >> >> If we knew that the list code was the bottleneck in a handful of >> cases, then I'd come down on Robert's side here. It would then be >> possible to update the relevant bottlenecked code in an isolated >> fashion, while getting the lion's share of the benefit. However, I >> don't think that that's actually possible. The costs of using Lists >> everywhere are real and measurable, but it's also highly distributed. >> At least, that's my recollection from previous discussion from several >> years back. I remember talking about this with Andres in early 2016. > > It's distributed, but not *that* distributed. The largest source of > "cost" at execution time used to be all-over expression evaluation, but > that's gone now. That was a lot of places, but it's not outside of reach > of a targeted change. Now it's targetlist handling, which'd have to > change together with plan time, where it's a large issue. > So let's say we want to measure the improvement this patch gives us. What would be some reasonable (and corner) cases to benchmark? I do have some ideas, but as you've been looking at this in the past, perhaps you have something better. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 25, 2019 at 1:31 PM Andres Freund <andres@anarazel.de> wrote: > > Andres said that he doesn't like the pg_list.h API. It's not pretty, > > but is it really that bad? > > Yes. The function names alone confound anybody new to postgres, we tend > to forget that after a few years. A lot of the function return types are > basically unpredictable without reading the code, the number of builtin > types is pretty restrictive, and there's no typesafety around the choice > of actually stored. But a lot of those restrictions are a consequence of needing what amount to support functions in places as distant from pg_list.h as pg_stat_statements.c, or the parser, or outfuncs.c. I'm not saying that we couldn't do better here, but the design is constrained by this. If you add a support for a new datatype, where does that leave stored rules? Seems ticklish to me, at the very least. -- Peter Geoghegan
Hi, On 2019-02-25 22:35:37 +0100, Tomas Vondra wrote: > So let's say we want to measure the improvement this patch gives us. > What would be some reasonable (and corner) cases to benchmark? I do have > some ideas, but as you've been looking at this in the past, perhaps you > have something better. I think queries over tables with a fair number of columns very easily stress test the list overhead around targetlists - I don't have a profile lying around, but the overhead of targetlist processing (ExecTypeFromTL etc) at execution time clearly shows up. Larger individual expressions can easily show up via eval_const_expressions() etc and ExecInitExpr(). Both probably can be separated into benchmarks with prepared statements (ExecTypeFromTl() and ExecInitExpr() will show up, but planner work obviously not), and non-prepared benchmarks will stress the planner more. I suspect there's also a few planner benefits with large numbers of paths, but I don't quite remember the profiles well enough to construct a benchmark from memory. Greetings, Andres Freund
Hi, On 2019-02-25 13:41:48 -0800, Peter Geoghegan wrote: > On Mon, Feb 25, 2019 at 1:31 PM Andres Freund <andres@anarazel.de> wrote: > > > Andres said that he doesn't like the pg_list.h API. It's not pretty, > > > but is it really that bad? > > > > Yes. The function names alone confound anybody new to postgres, we tend > > to forget that after a few years. A lot of the function return types are > > basically unpredictable without reading the code, the number of builtin > > types is pretty restrictive, and there's no typesafety around the choice > > of actually stored. > > But a lot of those restrictions are a consequence of needing what > amount to support functions in places as distant from pg_list.h as > pg_stat_statements.c, or the parser, or outfuncs.c. Those could trivially support distinguisiong at least between lists containing pointer, int, oid, or node. But even optionally doing more than that would be fairly easy. It's not those modules don't currently know the types of elements they're dealing with? > If you add a support for a new datatype, where does that leave > stored rules? We don't maintain stored rules across major versions (they break due to a lot of changes), so I don't quite understand that problem. Greetings, Andres Freund
On Mon, Feb 25, 2019 at 1:51 PM Andres Freund <andres@anarazel.de> wrote: > Those could trivially support distinguisiong at least between lists > containing pointer, int, oid, or node. But even optionally doing more > than that would be fairly easy. It's not those modules don't currently > know the types of elements they're dealing with? > > > > If you add a support for a new datatype, where does that leave > > stored rules? > > We don't maintain stored rules across major versions (they break due to > a lot of changes), so I don't quite understand that problem. The point is that the implicit need to have support for serializing and deserializing everything is something that constrains the design, and must also constrain the design of any successor data structure. The contents of pg_list.[ch] are not why it's a PITA to add support for a new datatype. Also, most of the time the Lists are lists of nodes, which is essentially an abstract base type for heterogeneous types anyway. I don't really get what you mean about type safety, because you haven't spelled it out in a way that acknowledges all of this. -- Peter Geoghegan
Hi,
On 2019-02-23 21:24:40 -0500, Tom Lane wrote:
> For quite some years now there's been dissatisfaction with our List
> data structure implementation.  Because it separately palloc's each
> list cell, it chews up lots of memory, and it's none too cache-friendly
> because the cells aren't necessarily adjacent.
Indeed. Might be worthwhile to note that linked list, even if stored in
adjacent memory, are *still* not very friendly for out-of-order CPUs, as
they introduce a dependency between fetching the pointer to the next
element, and processing the next element. Whereas for arrays etc CPUs
start executing instructions for the next element, before finishing the
last one.
> Every time this has come up, I've opined that the right fix is to jack
> up the List API and drive a new implementation underneath, as we did
> once before (cf commit d0b4399d81).
Btw, should we remove the ENABLE_LIST_COMPAT stuff independent of this
discussion? Seems like we could even just do that for 12.
> The big-picture performance change is that this makes list_nth()
> a cheap O(1) operation, while lappend() is still pretty cheap;
> on the downside, lcons() becomes O(N), as does insertion or deletion
> in the middle of a list.  But we don't use lcons() very much
> (and maybe a lot of the existing usages aren't really necessary?),
> while insertion/deletion in long lists is a vanishingly infrequent
> operation.  Meanwhile, making list_nth() cheap is a *huge* win.
Right.
> The most critical thing that we lose by doing this is that when a
> List is modified, all of its cells may need to move, which breaks
> a lot of code that assumes it can insert or delete a cell while
> hanging onto a pointer to a nearby cell.
We could probably "fix" both this, and the cost of making such
modifications, by having more of an list-of-arrays style
representation. When adding/removing middle-of-the-"list" elements, we
could chop that array into two (by modifying metadata, not freeing), and
shove the new data into a new array inbetween.  But I don't think that'd
overall be a win, even if it'd get us out of the silent API breakage
business.
> Another annoying consequence of lnext() needing a List pointer is that
> the List arguments of foreach() and related macros are now evaluated
> each time through the loop.  I could only find half a dozen places
> where that was actually unsafe (all fixed in the draft patch), but
> it's still bothersome.  I experimented with ways to avoid that, but
> the only way I could get it to work was to define foreach like this:
Yea, that problem is why the ilist stuff has the iterator
datastructure. That was before we allowed variable defs in for
though...
> #define foreach(cell, l)        for (const List *cell##__foreach = foreach_setup(l, &cell);          cell != NULL;
cell= lnext(cell##__foreach, cell))
 
> 
> static inline const List *
> foreach_setup(const List *l, ListCell **cell)
> {
>     *cell = list_head(l);
>     return l;
> }
> 
> That works, but there are two problems.  The lesser one is that a
> not-very-bright compiler might think that the "cell" variable has to
> be forced into memory, because its address is taken.
I don't think that's a huge problem. I don't think there are any
platforms we really care about with such compilers? And I can't imagine
that being the only performance problem on such a platform.
> The bigger one is
> that this coding forces the "cell" variable to be exactly "ListCell *";
> you can't add const or volatile qualifiers to it without getting
> compiler warnings.  There are actually more places that'd be affected
> by that than by the need to avoid multiple evaluations.  I don't think
> the const markings would be a big deal to lose, and the two places in
> do_autovacuum that need "volatile" (because of a nearby PG_TRY) could
> be rewritten to not use foreach.  So I'm tempted to do that, but it's
> not very pretty.
Hm, that's a bit ugly, indeed.
> Maybe somebody can think of a better solution?
We could cast away const & volatile on most compilers, and do better on
gcc & clang, I guess. We could use typeof() and similar games to add the
relevant qualifiers. Or alternatively, also optionally of course, use
C11 _Generic trickery for defining the type.  But that seems
unsatisfying (but safe, I think).
> There's a lot of potential follow-on work that I've not touched yet:
> 
> 1. This still involves at least two palloc's for every nonempty List,
> because I kept the header and the data array separate.  Perhaps it's
> worth allocating those in one palloc.  However, right now there's an
> assumption that the header of a nonempty List doesn't move when you
> change its contents; that's embedded in the API of the lappend_cell
> functions, and more than likely there's code that depends on that
> silently because it doesn't bother to store the results of other
> List functions back into the appropriate pointer.  So if we do that
> at all I think it'd be better tackled in a separate patch; and I'm
> not really convinced it's worth the trouble and extra risk of bugs.
Hm, I think if we force external code to audit their code, we better
also do this. This is a significant number of allocations, and I don't
think it'd be good to spread this out over two releases.
Greetings,
Andres Freund
			
		Andres Freund <andres@anarazel.de> writes:
> Btw, should we remove the ENABLE_LIST_COMPAT stuff independent of this
> discussion? Seems like we could even just do that for 12.
+1.  I took it out in the POC patch, but I see no very good reason
not to do it sooner than that.
>> The most critical thing that we lose by doing this is that when a
>> List is modified, all of its cells may need to move, which breaks
>> a lot of code that assumes it can insert or delete a cell while
>> hanging onto a pointer to a nearby cell.
> We could probably "fix" both this, and the cost of making such
> modifications, by having more of an list-of-arrays style
> representation. When adding/removing middle-of-the-"list" elements, we
> could chop that array into two (by modifying metadata, not freeing), and
> shove the new data into a new array inbetween.  But I don't think that'd
> overall be a win, even if it'd get us out of the silent API breakage
> business.
Yeah, I'm afraid that would still leave us with pretty expensive
primitives.
>> 1. This still involves at least two palloc's for every nonempty List,
>> because I kept the header and the data array separate.  Perhaps it's
>> worth allocating those in one palloc.
> Hm, I think if we force external code to audit their code, we better
> also do this. This is a significant number of allocations, and I don't
> think it'd be good to spread this out over two releases.
If we choose to do it, I'd agree with doing both in the same major release
cycle, so that extensions see just one breakage.  But I think it'd still
best be developed as a follow-on patch.
I had an idea that perhaps is worth considering --- upthread I rejected
the idea of deleting lnext() entirely, but what if we did so?  We could
redefine foreach() to not use it:
#define foreach(cell, l) \
    for (int cell##__index = 0; \
         (cell = list_nth_cell(l, cell##__index)) != NULL; \
         cell##__index++)
We'd need to fix list_nth_cell() to return NULL not choke on an index
equal to (or past?) the array end, but that's easy.
I think this would go a very long way towards eliminating the hazards
associated with iterating around a list-modification operation.
On the downside, it's hard to see how to merge it with the other idea
for evaluating the List reference only once, so we'd still have the
hazard that the list ref had better be a stable expression.  But that's
likely to be much easier to audit for than what the POC patch asks
people to do (maybe there's a way to check it mechanically, even?).
Also, any code that does contain explicit use of lnext() is likely
in need of rethinking anyhow, so taking it away would help answer
the objection about making problems easy to identify.
            regards, tom lane
			
		I wrote:
> Andres Freund <andres@anarazel.de> writes:
>>> 1. This still involves at least two palloc's for every nonempty List,
>>> because I kept the header and the data array separate.  Perhaps it's
>>> worth allocating those in one palloc.
>> Hm, I think if we force external code to audit their code, we better
>> also do this. This is a significant number of allocations, and I don't
>> think it'd be good to spread this out over two releases.
> If we choose to do it, I'd agree with doing both in the same major release
> cycle, so that extensions see just one breakage.  But I think it'd still
> best be developed as a follow-on patch.
By the by ... this idea actively breaks the mechanism I'd proposed for
preserving foreach's behavior of evaluating the List reference only once.
If we save a hidden copy of whatever the user says the List reference
is, and then he assigns a new value to it mid-loop, we're screwed if
the list header can move.
Now do you see why I'm a bit afraid of this?  Perhaps it's worth doing,
but it's going to introduce a whole new set of code breakages that are
going to be just as hard to audit for as anything else discussed in
this thread.  (Example: outer function creates a nonempty list, and
passes it down to some child function that appends to the list, and
there's no provision for returning the possibly-modified list header
pointer back up.)  I'm not really convinced that saving one more palloc
per List is worth it.
            regards, tom lane
			
		Hi, On 2019-02-25 18:41:17 -0500, Tom Lane wrote: > I wrote: > > Andres Freund <andres@anarazel.de> writes: > >>> 1. This still involves at least two palloc's for every nonempty List, > >>> because I kept the header and the data array separate. Perhaps it's > >>> worth allocating those in one palloc. > > >> Hm, I think if we force external code to audit their code, we better > >> also do this. This is a significant number of allocations, and I don't > >> think it'd be good to spread this out over two releases. > > > If we choose to do it, I'd agree with doing both in the same major release > > cycle, so that extensions see just one breakage. But I think it'd still > > best be developed as a follow-on patch. > > By the by ... this idea actively breaks the mechanism I'd proposed for > preserving foreach's behavior of evaluating the List reference only once. > If we save a hidden copy of whatever the user says the List reference > is, and then he assigns a new value to it mid-loop, we're screwed if > the list header can move. Hm, I wonder if that's necessary / whether we can just work around user visible breakage at a small cost. I think I'm mostly concerned with two allocations for the very common case of small (1-3 entries) lists. We could just allocate the first array together with the header, and not free that if the list grows beyond that point. That'd mean we'd only do separate allocations once they actually amortize over a number of allocations. Greetings, Andres Freund
Hi, On 2019-02-25 17:55:46 -0800, Andres Freund wrote: > Hm, I wonder if that's necessary / whether we can just work around user > visible breakage at a small cost. I think I'm mostly concerned with two > allocations for the very common case of small (1-3 entries) lists. We > could just allocate the first array together with the header, and not > free that if the list grows beyond that point. That'd mean we'd only do > separate allocations once they actually amortize over a number of > allocations. Btw, if we actually were going to go for always allocating header + data together (and thus incuring the problems you mention upthread), we ought to store the members as a FLEXIBLE_ARRAY_MEMBER together with the list. Probably not worth it, but reducing the number of pointer indirections for "list" accesses would be quite neat. Greetings, Andres Freund
On Sun, 24 Feb 2019 at 15:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I haven't attempted any performance measurements on this, but at > least in principle it should speed things up a bit, especially > for complex test cases involving longer Lists. I don't have any > very suitable test cases at hand, anyway. I've not yet looked at the code, but I thought I'd give this a quick benchmark. Using the attached patch (as text file so as not to upset the CFbot), which basically just measures and logs the time taken to run pg_plan_query. Using this, I ran make installcheck 3 times unpatched and same again with the patch. I pulled the results of each run into a spreadsheet and took the average of each of the 3 runs then took the sum of the total average planning time over the 20334 individual results. Results patched atop of 067786cea: Total average time unpatched: 0.739808667 seconds Total average time patched: 0.748144333 seconds. Surprisingly it took 1.13% longer. I did these tests on an AWS md5.large instance. If required, I can send the entire spreadsheet. It's about 750 KB. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> Using the attached patch (as text file so as not to upset the CFbot),
> which basically just measures and logs the time taken to run
> pg_plan_query. ...
> Surprisingly it took 1.13% longer.  I did these tests on an AWS
> md5.large instance.
Interesting.  Seems to suggest that maybe the cases I discounted
as being infrequent aren't so infrequent?  Another possibility
is that the new coding adds more cycles to foreach() loops than
I'd hoped for.
Anyway, it's just a POC; the main point at this stage is to be
able to make such comparisons at all.  If it turns out that we
*can't* make this into a win, then all that bellyaching about
how inefficient Lists are was misinformed ...
            regards, tom lane
			
		I wrote:
> I had an idea that perhaps is worth considering --- upthread I rejected
> the idea of deleting lnext() entirely, but what if we did so?  We could
> redefine foreach() to not use it:
> #define foreach(cell, l) \
>     for (int cell##__index = 0; \
>          (cell = list_nth_cell(l, cell##__index)) != NULL; \
>          cell##__index++)
> I think this would go a very long way towards eliminating the hazards
> associated with iterating around a list-modification operation.
I spent some time trying to modify my patch to work that way, but
abandoned the idea before completing it, because it became pretty
clear that it is a bad idea.  There are at least two issues:
1. In many places, the patch as I had it only required adding an
additional parameter to lnext() calls.  Removing lnext() altogether is
far more invasive, requiring restructuring loop logic in many places
where we otherwise wouldn't need to.  Since most of the point of this
proposal is to not have a larger patch footprint than we have to, that
seemed like a step in the wrong direction.
2. While making foreach() work this way might possibly help in avoiding
writing bad code in the first place, a loop of this form is really
just about as vulnerable to being broken by list insertions/deletions
as what I had before.  If you don't make suitable adjustments to the
integer index after an insertion/deletion then you're liable to skip
over, or double-visit, some list entries; and there's nothing here to
help you notice that you need to do that.  Worse, doing things like
this utterly destroys our chance of detecting such errors, because
there's no state being kept that's in any way checkable.
I was driven to realize point 2 by noticing, while trying to get rid
of some lnext calls, that I'd mostly failed in the v1 patch to fix
loops that contain embedded list_delete() calls other than
list_delete_cell().  This is because the debug support I'd added failed
to force relocation of lists after a deletion (unlike the insertion
cases).  It won't take much to add such relocation, and I'll go do that;
but with an integer-index-based loop implementation we've got no chance
of having debug support that could catch failure to update the loop index.
So I think that idea is a failure, and going forward with the v1
approach has better chances.
I did find a number of places where getting rid of explicit lnext()
calls led to just plain cleaner code.  Most of these were places that
could be using forboth() or forthree() and just weren't.  There's
also several places that are crying out for a forfour() macro, so
I'm not sure why we've stubbornly refused to provide it.  I'm a bit
inclined to just fix those things in the name of code readability,
independent of this patch.
I also noticed that there's quite a lot of places that are testing
lnext(cell) for being NULL or not.  What that really means is whether
this cell is last in the list or not, so maybe readability would be
improved by defining a macro along the lines of list_cell_is_last().
Any thoughts about that?
            regards, tom lane
			
		I wrote:
> I did find a number of places where getting rid of explicit lnext()
> calls led to just plain cleaner code.  Most of these were places that
> could be using forboth() or forthree() and just weren't.  There's
> also several places that are crying out for a forfour() macro, so
> I'm not sure why we've stubbornly refused to provide it.  I'm a bit
> inclined to just fix those things in the name of code readability,
> independent of this patch.
0001 below does this.  I found a couple of places that could use
forfive(), as well.  I think this is a clear legibility and
error-proofing win, and we should just push it.
> I also noticed that there's quite a lot of places that are testing
> lnext(cell) for being NULL or not.  What that really means is whether
> this cell is last in the list or not, so maybe readability would be
> improved by defining a macro along the lines of list_cell_is_last().
> Any thoughts about that?
0002 below does this.  I'm having a hard time deciding whether this
part is a good idea or just code churn.  It might be more readable
(especially to newbies) but I can't evaluate that very objectively.
I'm particularly unsure about whether we need two macros; though the
way I initially tried it with just list_cell_is_last() seemed kind of
double-negatively confusing in the places where the test needs to be
not-last.  Also, are these macro names too long, and if so what would
be better?
Also: if we accept either or both of these, should we back-patch the
macro additions, so that these new macros will be available for use
in back-patched code?  I'm not sure that forfour/forfive have enough
use-cases to worry about that; but the is-last macros would have a
better case for that, I think.
            regards, tom lane
diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 47e80ae..832c3e9 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -902,23 +902,12 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
     desc = CreateTemplateTupleDesc(natts);
     attnum = 0;
-
-    l2 = list_head(types);
-    l3 = list_head(typmods);
-    l4 = list_head(collations);
-    foreach(l1, names)
+    forfour(l1, names, l2, types, l3, typmods, l4, collations)
     {
         char       *attname = strVal(lfirst(l1));
-        Oid            atttypid;
-        int32        atttypmod;
-        Oid            attcollation;
-
-        atttypid = lfirst_oid(l2);
-        l2 = lnext(l2);
-        atttypmod = lfirst_int(l3);
-        l3 = lnext(l3);
-        attcollation = lfirst_oid(l4);
-        l4 = lnext(l4);
+        Oid            atttypid = lfirst_oid(l2);
+        int32        atttypmod = lfirst_int(l3);
+        Oid            attcollation = lfirst_oid(l4);
         attnum++;
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index db3777d..7cbf9d3 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1683,7 +1683,6 @@ ExecInitExprRec(Expr *node, ExprState *state,
                            *l_opfamily,
                            *l_inputcollid;
                 ListCell   *lc;
-                int            off;
                 /*
                  * Iterate over each field, prepare comparisons.  To handle
@@ -1695,20 +1694,11 @@ ExecInitExprRec(Expr *node, ExprState *state,
                 Assert(list_length(rcexpr->opfamilies) == nopers);
                 Assert(list_length(rcexpr->inputcollids) == nopers);
-                off = 0;
-                for (off = 0,
-                     l_left_expr = list_head(rcexpr->largs),
-                     l_right_expr = list_head(rcexpr->rargs),
-                     l_opno = list_head(rcexpr->opnos),
-                     l_opfamily = list_head(rcexpr->opfamilies),
-                     l_inputcollid = list_head(rcexpr->inputcollids);
-                     off < nopers;
-                     off++,
-                     l_left_expr = lnext(l_left_expr),
-                     l_right_expr = lnext(l_right_expr),
-                     l_opno = lnext(l_opno),
-                     l_opfamily = lnext(l_opfamily),
-                     l_inputcollid = lnext(l_inputcollid))
+                forfive(l_left_expr, rcexpr->largs,
+                        l_right_expr, rcexpr->rargs,
+                        l_opno, rcexpr->opnos,
+                        l_opfamily, rcexpr->opfamilies,
+                        l_inputcollid, rcexpr->inputcollids)
                 {
                     Expr       *left_expr = (Expr *) lfirst(l_left_expr);
                     Expr       *right_expr = (Expr *) lfirst(l_right_expr);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 324356e..8b29437 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -1332,12 +1332,12 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
         {
             /* (indexkey, indexkey, ...) op (expression, expression, ...) */
             RowCompareExpr *rc = (RowCompareExpr *) clause;
-            ListCell   *largs_cell = list_head(rc->largs);
-            ListCell   *rargs_cell = list_head(rc->rargs);
-            ListCell   *opnos_cell = list_head(rc->opnos);
-            ListCell   *collids_cell = list_head(rc->inputcollids);
             ScanKey        first_sub_key;
             int            n_sub_key;
+            ListCell   *largs_cell;
+            ListCell   *rargs_cell;
+            ListCell   *opnos_cell;
+            ListCell   *collids_cell;
             Assert(!isorderby);
@@ -1346,19 +1346,22 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
             n_sub_key = 0;
             /* Scan RowCompare columns and generate subsidiary ScanKey items */
-            while (opnos_cell != NULL)
+            forfour(largs_cell, rc->largs, rargs_cell, rc->rargs,
+                    opnos_cell, rc->opnos, collids_cell, rc->inputcollids)
             {
                 ScanKey        this_sub_key = &first_sub_key[n_sub_key];
                 int            flags = SK_ROW_MEMBER;
                 Datum        scanvalue;
                 Oid            inputcollation;
+                leftop = (Expr *) lfirst(largs_cell);
+                rightop = (Expr *) lfirst(rargs_cell);
+                opno = lfirst_oid(opnos_cell);
+                inputcollation = lfirst_oid(collids_cell);
+
                 /*
                  * leftop should be the index key Var, possibly relabeled
                  */
-                leftop = (Expr *) lfirst(largs_cell);
-                largs_cell = lnext(largs_cell);
-
                 if (leftop && IsA(leftop, RelabelType))
                     leftop = ((RelabelType *) leftop)->arg;
@@ -1374,9 +1377,6 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
                  * We have to look up the operator's associated btree support
                  * function
                  */
-                opno = lfirst_oid(opnos_cell);
-                opnos_cell = lnext(opnos_cell);
-
                 if (index->rd_rel->relam != BTREE_AM_OID ||
                     varattno < 1 || varattno > indnkeyatts)
                     elog(ERROR, "bogus RowCompare index qualification");
@@ -1398,15 +1398,9 @@ ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
                     elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
                          BTORDER_PROC, op_lefttype, op_righttype, opfamily);
-                inputcollation = lfirst_oid(collids_cell);
-                collids_cell = lnext(collids_cell);
-
                 /*
                  * rightop is the constant or variable comparison value
                  */
-                rightop = (Expr *) lfirst(rargs_cell);
-                rargs_cell = lnext(rargs_cell);
-
                 if (rightop && IsA(rightop, RelabelType))
                     rightop = ((RelabelType *) rightop)->arg;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index c721054..555c91f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1680,9 +1680,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
      */
     tlist = testlist = paramids = NIL;
     resno = 1;
-    /* there's no "forfour" so we have to chase one of the lists manually */
-    cc = list_head(opcollations);
-    forthree(lc, leftargs, rc, rightargs, oc, opids)
+    forfour(lc, leftargs, rc, rightargs, oc, opids, cc, opcollations)
     {
         Node       *leftarg = (Node *) lfirst(lc);
         Node       *rightarg = (Node *) lfirst(rc);
@@ -1690,7 +1688,6 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
         Oid            opcollation = lfirst_oid(cc);
         Param       *param;
-        cc = lnext(cc);
         param = generate_new_exec_param(root,
                                         exprType(rightarg),
                                         exprTypmod(rightarg),
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 55eeb51..eb815c2 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1130,17 +1130,14 @@ generate_setop_tlist(List *colTypes, List *colCollations,
     TargetEntry *tle;
     Node       *expr;
-    /* there's no forfour() so we must chase one list manually */
-    rtlc = list_head(refnames_tlist);
-    forthree(ctlc, colTypes, cclc, colCollations, itlc, input_tlist)
+    forfour(ctlc, colTypes, cclc, colCollations,
+            itlc, input_tlist, rtlc, refnames_tlist)
     {
         Oid            colType = lfirst_oid(ctlc);
         Oid            colColl = lfirst_oid(cclc);
         TargetEntry *inputtle = (TargetEntry *) lfirst(itlc);
         TargetEntry *reftle = (TargetEntry *) lfirst(rtlc);
-        rtlc = lnext(rtlc);
-
         Assert(inputtle->resno == resno);
         Assert(reftle->resno == resno);
         Assert(!inputtle->resjunk);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index e3544ef..5efd86e 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -831,18 +831,14 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
      */
     rte = pstate->p_target_rangetblentry;
     qry->targetList = NIL;
-    icols = list_head(icolumns);
-    attnos = list_head(attrnos);
-    foreach(lc, exprList)
+    Assert(list_length(exprList) <= list_length(icolumns));
+    forthree(lc, exprList, icols, icolumns, attnos, attrnos)
     {
         Expr       *expr = (Expr *) lfirst(lc);
-        ResTarget  *col;
-        AttrNumber    attr_num;
+        ResTarget  *col = lfirst_node(ResTarget, icols);
+        AttrNumber    attr_num = (AttrNumber) lfirst_int(attnos);
         TargetEntry *tle;
-        col = lfirst_node(ResTarget, icols);
-        attr_num = (AttrNumber) lfirst_int(attnos);
-
         tle = makeTargetEntry(expr,
                               attr_num,
                               col->name,
@@ -851,9 +848,6 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
         rte->insertedCols = bms_add_member(rte->insertedCols,
                                            attr_num - FirstLowInvalidHeapAttributeNumber);
-
-        icols = lnext(icols);
-        attnos = lnext(attnos);
     }
     /* Process ON CONFLICT, if any. */
@@ -950,19 +944,16 @@ transformInsertRow(ParseState *pstate, List *exprlist,
      * Prepare columns for assignment to target table.
      */
     result = NIL;
-    icols = list_head(icolumns);
-    attnos = list_head(attrnos);
-    foreach(lc, exprlist)
+    forthree(lc, exprlist, icols, icolumns, attnos, attrnos)
     {
         Expr       *expr = (Expr *) lfirst(lc);
-        ResTarget  *col;
-
-        col = lfirst_node(ResTarget, icols);
+        ResTarget  *col = lfirst_node(ResTarget, icols);
+        int            attno = lfirst_int(attnos);
         expr = transformAssignedExpr(pstate, expr,
                                      EXPR_KIND_INSERT_TARGET,
                                      col->name,
-                                     lfirst_int(attnos),
+                                     attno,
                                      col->indirection,
                                      col->location);
@@ -991,9 +982,6 @@ transformInsertRow(ParseState *pstate, List *exprlist,
         }
         result = lappend(result, expr);
-
-        icols = lnext(icols);
-        attnos = lnext(attnos);
     }
     return result;
@@ -1699,11 +1687,11 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
     qry->targetList = NIL;
     targetvars = NIL;
     targetnames = NIL;
-    left_tlist = list_head(leftmostQuery->targetList);
-    forthree(lct, sostmt->colTypes,
-             lcm, sostmt->colTypmods,
-             lcc, sostmt->colCollations)
+    forfour(lct, sostmt->colTypes,
+            lcm, sostmt->colTypmods,
+            lcc, sostmt->colCollations,
+            left_tlist, leftmostQuery->targetList)
     {
         Oid            colType = lfirst_oid(lct);
         int32        colTypmod = lfirst_int(lcm);
@@ -1729,7 +1717,6 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
         qry->targetList = lappend(qry->targetList, tle);
         targetvars = lappend(targetvars, var);
         targetnames = lappend(targetnames, makeString(colName));
-        left_tlist = lnext(left_tlist);
     }
     /*
@@ -2201,10 +2188,9 @@ determineRecursiveColTypes(ParseState *pstate, Node *larg, List *nrtargetlist)
      * dummy result expressions of the non-recursive term.
      */
     targetList = NIL;
-    left_tlist = list_head(leftmostQuery->targetList);
     next_resno = 1;
-    foreach(nrtl, nrtargetlist)
+    forboth(nrtl, nrtargetlist, left_tlist, leftmostQuery->targetList)
     {
         TargetEntry *nrtle = (TargetEntry *) lfirst(nrtl);
         TargetEntry *lefttle = (TargetEntry *) lfirst(left_tlist);
@@ -2218,7 +2204,6 @@ determineRecursiveColTypes(ParseState *pstate, Node *larg, List *nrtargetlist)
                               colName,
                               false);
         targetList = lappend(targetList, tle);
-        left_tlist = lnext(left_tlist);
     }
     /* Now build CTE's output column info using dummy targetlist */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 5222231..654ee80 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -2124,13 +2124,12 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs *func, bool noError)
                                FUNC_MAX_ARGS,
                                FUNC_MAX_ARGS)));
-    args_item = list_head(func->objargs);
-    for (i = 0; i < argcount; i++)
+    i = 0;
+    foreach(args_item, func->objargs)
     {
         TypeName   *t = (TypeName *) lfirst(args_item);
-        argoids[i] = LookupTypeNameOid(NULL, t, noError);
-        args_item = lnext(args_item);
+        argoids[i++] = LookupTypeNameOid(NULL, t, noError);
     }
     /*
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 1258092..85055bb 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9811,31 +9811,18 @@ get_tablefunc(TableFunc *tf, deparse_context *context, bool showimplicit)
         ListCell   *l5;
         int            colnum = 0;
-        l2 = list_head(tf->coltypes);
-        l3 = list_head(tf->coltypmods);
-        l4 = list_head(tf->colexprs);
-        l5 = list_head(tf->coldefexprs);
-
         appendStringInfoString(buf, " COLUMNS ");
-        foreach(l1, tf->colnames)
+        forfive(l1, tf->colnames, l2, tf->coltypes, l3, tf->coltypmods,
+                l4, tf->colexprs, l5, tf->coldefexprs)
         {
             char       *colname = strVal(lfirst(l1));
-            Oid            typid;
-            int32        typmod;
-            Node       *colexpr;
-            Node       *coldefexpr;
-            bool        ordinality = tf->ordinalitycol == colnum;
+            Oid            typid = lfirst_oid(l2);
+            int32        typmod = lfirst_int(l3);
+            Node       *colexpr = (Node *) lfirst(l4);
+            Node       *coldefexpr = (Node *) lfirst(l5);
+            bool        ordinality = (tf->ordinalitycol == colnum);
             bool        notnull = bms_is_member(colnum, tf->notnulls);
-            typid = lfirst_oid(l2);
-            l2 = lnext(l2);
-            typmod = lfirst_int(l3);
-            l3 = lnext(l3);
-            colexpr = (Node *) lfirst(l4);
-            l4 = lnext(l4);
-            coldefexpr = (Node *) lfirst(l5);
-            l5 = lnext(l5);
-
             if (colnum > 0)
                 appendStringInfoString(buf, ", ");
             colnum++;
@@ -10349,12 +10336,11 @@ get_from_clause_coldeflist(RangeTblFunction *rtfunc,
     appendStringInfoChar(buf, '(');
-    /* there's no forfour(), so must chase one list the hard way */
     i = 0;
-    l4 = list_head(rtfunc->funccolnames);
-    forthree(l1, rtfunc->funccoltypes,
-             l2, rtfunc->funccoltypmods,
-             l3, rtfunc->funccolcollations)
+    forfour(l1, rtfunc->funccoltypes,
+            l2, rtfunc->funccoltypmods,
+            l3, rtfunc->funccolcollations,
+            l4, rtfunc->funccolnames)
     {
         Oid            atttypid = lfirst_oid(l1);
         int32        atttypmod = lfirst_int(l2);
@@ -10378,7 +10364,6 @@ get_from_clause_coldeflist(RangeTblFunction *rtfunc,
             appendStringInfo(buf, " COLLATE %s",
                              generate_collation_name(attcollation));
-        l4 = lnext(l4);
         i++;
     }
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 4624604..8dd22e7 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -205,6 +205,32 @@ list_length(const List *l)
          (cell1) != NULL && (cell2) != NULL && (cell3) != NULL;        \
          (cell1) = lnext(cell1), (cell2) = lnext(cell2), (cell3) = lnext(cell3))
+/*
+ * forfour -
+ *      the same for four lists
+ */
+#define forfour(cell1, list1, cell2, list2, cell3, list3, cell4, list4) \
+    for ((cell1) = list_head(list1), (cell2) = list_head(list2), \
+         (cell3) = list_head(list3), (cell4) = list_head(list4); \
+         (cell1) != NULL && (cell2) != NULL && \
+         (cell3) != NULL && (cell4) != NULL; \
+         (cell1) = lnext(cell1), (cell2) = lnext(cell2), \
+         (cell3) = lnext(cell3), (cell4) = lnext(cell4))
+
+/*
+ * forfive -
+ *      the same for five lists
+ */
+#define forfive(cell1, list1, cell2, list2, cell3, list3, cell4, list4, cell5, list5) \
+    for ((cell1) = list_head(list1), (cell2) = list_head(list2), \
+         (cell3) = list_head(list3), (cell4) = list_head(list4), \
+         (cell5) = list_head(list5); \
+         (cell1) != NULL && (cell2) != NULL && (cell3) != NULL && \
+         (cell4) != NULL && (cell5) != NULL; \
+         (cell1) = lnext(cell1), (cell2) = lnext(cell2), \
+         (cell3) = lnext(cell3), (cell4) = lnext(cell4), \
+         (cell5) = lnext(cell5))
+
 extern List *lappend(List *list, void *datum);
 extern List *lappend_int(List *list, int datum);
 extern List *lappend_oid(List *list, Oid datum);
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 92a0ab6..e8de6ac 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2617,7 +2617,7 @@ deparseFuncExpr(FuncExpr *node, deparse_expr_cxt *context)
     {
         if (!first)
             appendStringInfoString(buf, ", ");
-        if (use_variadic && lnext(arg) == NULL)
+        if (use_variadic && list_cell_is_last(arg))
             appendStringInfoString(buf, "VARIADIC ");
         deparseExpr((Expr *) lfirst(arg), context);
         first = false;
@@ -2945,7 +2945,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
                 first = false;
                 /* Add VARIADIC */
-                if (use_variadic && lnext(arg) == NULL)
+                if (use_variadic && list_cell_is_last(arg))
                     appendStringInfoString(buf, "VARIADIC ");
                 deparseExpr((Expr *) n, context);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1831ea8..c3d898e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -254,7 +254,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
                             queryString, params, queryEnv);
             /* Separate plans with an appropriate separator */
-            if (lnext(l) != NULL)
+            if (list_cell_is_not_last(l))
                 ExplainSeparatePlans(es);
         }
     }
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index a98c836..f7a1333 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -692,7 +692,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
         /* No need for CommandCounterIncrement, as ExplainOnePlan did it */
         /* Separate plans with an appropriate separator */
-        if (lnext(p) != NULL)
+        if (list_cell_is_not_last(p))
             ExplainSeparatePlans(es);
     }
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 9db8228..e0cc3e9 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -58,7 +58,7 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                      errmsg("no security label providers have been loaded")));
-        if (lnext(list_head(label_provider_list)) != NULL)
+        if (list_length(label_provider_list) > 1)
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                      errmsg("must specify provider when multiple security label providers have been loaded")));
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 8e5eec2..7c8990a 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -1558,7 +1558,7 @@ serialize_deflist(List *deflist)
             appendStringInfoChar(&buf, ch);
         }
         appendStringInfoChar(&buf, '\'');
-        if (lnext(l) != NULL)
+        if (list_cell_is_not_last(l))
             appendStringInfoString(&buf, ", ");
     }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 65302fe..7232552 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -218,7 +218,7 @@ _outList(StringInfo str, const List *node)
         if (IsA(node, List))
         {
             outNode(str, lfirst(lc));
-            if (lnext(lc))
+            if (list_cell_is_not_last(lc))
                 appendStringInfoChar(str, ' ');
         }
         else if (IsA(node, IntList))
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 4b9e141..8fb8634 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -410,7 +410,7 @@ print_expr(const Node *expr, const List *rtable)
         foreach(l, e->args)
         {
             print_expr(lfirst(l), rtable);
-            if (lnext(l))
+            if (list_cell_is_not_last(l))
                 printf(",");
         }
         printf(")");
@@ -453,7 +453,7 @@ print_pathkeys(const List *pathkeys, const List *rtable)
             print_expr((Node *) mem->em_expr, rtable);
         }
         printf(")");
-        if (lnext(i))
+        if (list_cell_is_not_last(i))
             printf(", ");
     }
     printf(")\n");
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0debac7..0c14af2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3721,7 +3721,7 @@ print_restrictclauses(PlannerInfo *root, List *clauses)
         RestrictInfo *c = lfirst(l);
         print_expr((Node *) c->clause, root->parse->rtable);
-        if (lnext(l))
+        if (list_cell_is_not_last(l))
             printf(", ");
     }
 }
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..bd94800 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1558,7 +1558,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
                 /* reject new path, remove it from paths list */
                 paths = list_delete_cell(paths, lnext(lastcell), lastcell);
             }
-            Assert(lnext(lastcell) == NULL);
+            Assert(list_cell_is_last(lastcell));
         }
         /* Keep the cheapest AND-group (or singleton) */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bc81535..e6a7d00 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4581,7 +4581,7 @@ create_one_window_path(PlannerInfo *root,
                                              -1.0);
         }
-        if (lnext(l))
+        if (list_cell_is_not_last(l))
         {
             /*
              * Add the current WindowFuncs to the output target for this
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 555c91f..0f8663f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -565,7 +565,7 @@ build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
         {
             ptr += sprintf(ptr, "$%d%s",
                            lfirst_int(lc),
-                           lnext(lc) ? "," : ")");
+                           list_cell_is_not_last(lc) ? "," : ")");
         }
     }
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 14d1c67..a5e7207 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -997,7 +997,7 @@ split_pathtarget_at_srfs(PlannerInfo *root,
         List       *level_srfs = (List *) lfirst(lc1);
         PathTarget *ntarget;
-        if (lnext(lc1) == NULL)
+        if (list_cell_is_last(lc1))
         {
             ntarget = target;
         }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0279013..96077ec 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -15509,7 +15509,7 @@ makeColumnRef(char *colname, List *indirection,
         else if (IsA(lfirst(l), A_Star))
         {
             /* We only allow '*' at the end of a ColumnRef */
-            if (lnext(l) != NULL)
+            if (list_cell_is_not_last(l))
                 parser_yyerror("improper use of \"*\"");
         }
         nfields++;
@@ -15698,7 +15698,7 @@ check_indirection(List *indirection, core_yyscan_t yyscanner)
     {
         if (IsA(lfirst(l), A_Star))
         {
-            if (lnext(l) != NULL)
+            if (list_cell_is_not_last(l))
                 parser_yyerror("improper use of \"*\"");
         }
     }
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index def6c03..ab32e59 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -355,7 +355,7 @@ perform_base_backup(basebackup_options *opt)
              */
             if (opt->includewal && ti->path == NULL)
             {
-                Assert(lnext(lc) == NULL);
+                Assert(list_cell_is_last(lc));
             }
             else
                 pq_putemptymessage('c');    /* CopyDone */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8b4d94c..82f275e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1224,7 +1224,7 @@ exec_simple_query(const char *query_string)
         PortalDrop(portal, false);
-        if (lnext(parsetree_item) == NULL)
+        if (list_cell_is_last(parsetree_item))
         {
             /*
              * If this is the last parsetree of the query string, close down
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 7f15933..9a31ff1 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1334,7 +1334,7 @@ PortalRunMulti(Portal portal,
          * Increment command counter between queries, but not after the last
          * one.
          */
-        if (lnext(stmtlist_item) != NULL)
+        if (list_cell_is_not_last(stmtlist_item))
             CommandCounterIncrement();
         /*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 6ec795f..3a67d4d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1070,7 +1070,7 @@ ProcessUtilitySlow(ParseState *pstate,
                         }
                         /* Need CCI between commands */
-                        if (lnext(l) != NULL)
+                        if (list_cell_is_not_last(l))
                             CommandCounterIncrement();
                     }
@@ -1151,7 +1151,7 @@ ProcessUtilitySlow(ParseState *pstate,
                             }
                             /* Need CCI between commands */
-                            if (lnext(l) != NULL)
+                            if (list_cell_is_not_last(l))
                                 CommandCounterIncrement();
                         }
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 85055bb..564bd49 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2722,7 +2722,7 @@ pg_get_functiondef(PG_FUNCTION_ARGS)
                         char       *curname = (char *) lfirst(lc);
                         simple_quote_literal(&buf, curname);
-                        if (lnext(lc))
+                        if (list_cell_is_not_last(lc))
                             appendStringInfoString(&buf, ", ");
                     }
                 }
@@ -8081,7 +8081,7 @@ get_rule_expr(Node *node, deparse_context *context,
                         appendStringInfo(buf, "hashed %s", splan->plan_name);
                     else
                         appendStringInfoString(buf, splan->plan_name);
-                    if (lnext(lc))
+                    if (list_cell_is_not_last(lc))
                         appendStringInfoString(buf, " or ");
                 }
                 appendStringInfoChar(buf, ')');
@@ -9189,7 +9189,7 @@ get_func_expr(FuncExpr *expr, deparse_context *context,
     {
         if (nargs++ > 0)
             appendStringInfoString(buf, ", ");
-        if (use_variadic && lnext(l) == NULL)
+        if (use_variadic && list_cell_is_last(l))
             appendStringInfoString(buf, "VARIADIC ");
         get_rule_expr((Node *) lfirst(l), context, true);
     }
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 8dd22e7..a35772d 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -133,6 +133,9 @@ list_length(const List *l)
 #define llast_oid(l)            lfirst_oid(list_tail(l))
 #define llast_node(type,l)        castNode(type, llast(l))
+#define list_cell_is_last(l)        (lnext(l) == NULL)
+#define list_cell_is_not_last(l)    (lnext(l) != NULL)
+
 /*
  * Convenience macros for building fixed-length lists
  */
			
		On Wed, Feb 27, 2019 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > 0001 below does this. I found a couple of places that could use > forfive(), as well. I think this is a clear legibility and > error-proofing win, and we should just push it. It sounds like some of these places might need a bigger restructuring - i.e. to iterate over a list/vector of structs with 5 members instead of iterating over five lists in parallel. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 27, 2019 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 0001 below does this.  I found a couple of places that could use
>> forfive(), as well.  I think this is a clear legibility and
>> error-proofing win, and we should just push it.
> It sounds like some of these places might need a bigger restructuring
> - i.e. to iterate over a list/vector of structs with 5 members instead
> of iterating over five lists in parallel.
Meh.  Most of them are iterating over parsetree substructure, eg the
components of a RowCompareExpr.  So we could not change them without
pretty extensive infrastructure changes including a catversion bump.
Also, while the separated substructure is indeed a pain in the rear
in some places, it's actually better for other uses.  Two examples
related to RowCompareExpr:
* match_rowcompare_to_indexcol can check whether all the left-hand
or right-hand expressions are nonvolatile with one easy call to
contain_volatile_functions on the respective list.  To do the
same with a single list of sub-structs, it'd need bespoke code
for each case to run through the list and consider only the correct
subexpression of each sub-struct.
* expand_indexqual_rowcompare can deal with commuted-clause cases just
by swapping the list pointers at the start, it doesn't have to think
about it over again for each pair of elements.
So I'm not that excited about refactoring the data representation
for these.  I'm content (for now) with getting these places in line
with the coding convention we use elsewhere for similar cases.
            regards, tom lane
			
		On 2019-Feb-27, Tom Lane wrote:
> I'm particularly unsure about whether we need two macros; though the
> way I initially tried it with just list_cell_is_last() seemed kind of
> double-negatively confusing in the places where the test needs to be
> not-last.  Also, are these macro names too long, and if so what would
> be better?
I think "!list_cell_is_last()" is just as readable, if not more, than
the "is_not_last" locution:
        appendStringInfoChar(&buf, '\'');
        if (!list_cell_is_last(l))
            appendStringInfoString(&buf, ", ");
I'd go with a single macro.
+1 for backpatching the new macros, too.  I suspect extension authors
are going to need to provide compatibility versions anyway, to be
compilable against older minors.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		On Thu, 28 Feb 2019 at 09:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > I did find a number of places where getting rid of explicit lnext() > > calls led to just plain cleaner code. Most of these were places that > > could be using forboth() or forthree() and just weren't. There's > > also several places that are crying out for a forfour() macro, so > > I'm not sure why we've stubbornly refused to provide it. I'm a bit > > inclined to just fix those things in the name of code readability, > > independent of this patch. > > 0001 below does this. I found a couple of places that could use > forfive(), as well. I think this is a clear legibility and > error-proofing win, and we should just push it. I've looked over this and I agree that it's a good idea. Reducing the number of lnext() usages seems like a good idea in order to reduce the footprint of the main patch. The only thing of interest that I saw during the review was the fact that you've chosen to assign colexpr and coldefexpr before the continue in get_tablefunc(). We may not end up using those values if we find an ordinality column. I'm pretty sure it's not worth breaking the mould for that case though, but just noting it anyway. > > I also noticed that there's quite a lot of places that are testing > > lnext(cell) for being NULL or not. What that really means is whether > > this cell is last in the list or not, so maybe readability would be > > improved by defining a macro along the lines of list_cell_is_last(). > > Any thoughts about that? > > 0002 below does this. I'm having a hard time deciding whether this > part is a good idea or just code churn. It might be more readable > (especially to newbies) but I can't evaluate that very objectively. > I'm particularly unsure about whether we need two macros; though the > way I initially tried it with just list_cell_is_last() seemed kind of > double-negatively confusing in the places where the test needs to be > not-last. Also, are these macro names too long, and if so what would > be better? I'm less decided on this. Having this now means you'll need to break the signature of the macro the same way as you'll need to break lnext(). It's perhaps easier to explain in the release notes about lnext() having changed so that extension authors can go fix their code (probably they'll know already from compile failures, but ....). On the other hand, if the list_cell_is_last() is new, then there will be no calls to that in extensions anyway. Maybe it's better to do it at the same time as the List reimplementation to ensure nobody needs to change anything twice? > Also: if we accept either or both of these, should we back-patch the > macro additions, so that these new macros will be available for use > in back-patched code? I'm not sure that forfour/forfive have enough > use-cases to worry about that; but the is-last macros would have a > better case for that, I think. I see no reason not to put forfour() and forfive() in the back branches. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 28 Feb 2019 at 09:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 0002 below does this.  I'm having a hard time deciding whether this
>> part is a good idea or just code churn.  It might be more readable
>> (especially to newbies) but I can't evaluate that very objectively.
> I'm less decided on this. Having this now means you'll need to break
> the signature of the macro the same way as you'll need to break
> lnext(). It's perhaps easier to explain in the release notes about
> lnext() having changed so that extension authors can go fix their code
> (probably they'll know already from compile failures, but ....). On
> the other hand, if the list_cell_is_last() is new, then there will be
> no calls to that in extensions anyway.  Maybe it's better to do it at
> the same time as the List reimplementation to ensure nobody needs to
> change anything twice?
Yeah, I was considering the idea of setting up the macro as
"list_cell_is_last(list, cell)" from the get-go, with the first
argument just going unused for the moment.  That would be a good
way to back-patch it if we go through with this.  On the other hand,
if we end up not pulling the trigger on the main patch, that'd
look pretty silly ...
            regards, tom lane
			
		On Tue, 26 Feb 2019 at 18:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > Using the attached patch (as text file so as not to upset the CFbot), > > which basically just measures and logs the time taken to run > > pg_plan_query. ... > > Surprisingly it took 1.13% longer. I did these tests on an AWS > > md5.large instance. > > Interesting. Seems to suggest that maybe the cases I discounted > as being infrequent aren't so infrequent? Another possibility > is that the new coding adds more cycles to foreach() loops than > I'd hoped for. I went and had a few adventures with this patch to see if I could figure out why the small ~1% regression exists. Profiling did not prove very useful as I saw none of the list functions come up. I had suspected it was the lcons() calls being expensive because then need to push the elements up one place each time, not something that'll scale well with larger lists. After changing things so that a new "first" element index in the List would allow new_head_cell() to just move everything to the end of the list and mark the start of the actual data... I discovered that slowed things down further... Likely due to all the additional arithmetic work required to find the first element. I then tried hacking at the foreach() macro after wondering if the lnext() call was somehow making things difficult for the compiler to predict what cell would come next. I experimented with the following monstrosity: for ((cell) = list_head(l); ((cell) && (cell) < &((List *) l)->elements[((List *) l)->first + ((List *) l)->length]) || (cell = NULL) != NULL; cell++) it made things worse again... It ended up much more ugly than I thought it would have as I had to account for an empty list being NIL and the fact that we need to make cell NULL after the loop is over. I tried a few other things... I didn't agree with your memmove() in list_concat(). I think memcpy() is fine, even when the list pointers are the same since we never overwrite any live cell values. Strangely I found memcpy slower than memmove... ? The only thing that I did to manage to speed the patch up was to ditch the additional NULL test in lnext(). I don't see why that's required since lnext(NULL) would have crashed with the old implementation. Removing this changed the 1.13% regression into a ~0.8% regression, which at least does show that the foreach() implementation can have an effect on performance. > Anyway, it's just a POC; the main point at this stage is to be > able to make such comparisons at all. If it turns out that we > *can't* make this into a win, then all that bellyaching about > how inefficient Lists are was misinformed ... My primary concern is how much we bend over backwards because list_nth() performance is not O(1). I know from my work on partitioning that ExecInitRangeTable()'s building of es_range_table_array has a huge impact for PREPAREd plans for simple PK lookup SELECT queries to partitioned tables with a large number of partitions, where only 1 of which will survive run-time pruning. I could get the execution speed of such a query with 300 partitions to within 94% of the non-partitioned version if the rtable could be looked up O(1) in the executor natively, (that some changes to ExecCheckRTPerms() to have it skip rtable entries that don't require permission checks.). Perhaps if we're not going to see gains from the patch alone then we'll need to tag on some of the additional stuff that will take advantage of list_nth() being fast and test the performance of it all again. Attached is the (mostly worthless) series of hacks I made to your patch. It might save someone some time if they happened to wonder the same thing as I did. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> I went and had a few adventures with this patch to see if I could
> figure out why the small ~1% regression exists.
Thanks for poking around!
> ... I had
> suspected it was the lcons() calls being expensive because then need
> to push the elements up one place each time, not something that'll
> scale well with larger lists.
I just did some looking around at lcons() calls, and it's hard to identify
any that seem like they would be performance-critical.  I did notice a
number of places that think that lcons'ing a item onto a list, and later
stripping it off with list_delete_first, is efficient.  With the new
implementation it's far cheaper to lappend and then list_truncate instead,
at least if the list is long.  If the list order matters then that's not
an option, but I found some places where it doesn't matter so we could get
an easy win.  Still, it wasn't obvious that this would move the needle at
all.
> I then tried hacking at the foreach() macro after wondering if the
> lnext() call was somehow making things difficult for the compiler to
> predict what cell would come next.
Yeah, my gut feeling right now is that foreach() is producing crummy
code, though it's not obvious why it would need to.  Maybe some
micro-optimization is called for.  But I've not had time to pursue it.
> The only thing that I did to manage to speed the patch up was to ditch
> the additional NULL test in lnext().  I don't see why that's required
> since lnext(NULL) would have crashed with the old implementation.
Hmmm ...
> Perhaps if we're not going to see gains from the patch alone then
> we'll need to tag on some of the additional stuff that will take
> advantage of list_nth() being fast and test the performance of it all
> again.
Yeah, evaluating this patch in complete isolation is a bit unfair.
Still, it'd be nice to hold the line in advance of any follow-on
improvements.
            regards, tom lane
			
		>>>>> "David" == David Rowley <david.rowley@2ndquadrant.com> writes: David> I went and had a few adventures with this patch to see if I David> could figure out why the small ~1% regression exists. Just changing the number of instructions (even in a completely unrelated place that's not called during the test) can generate performance variations of this size, even when there's no real difference. To get a reliable measurement of timing changes less than around 3%, what you have to do is this: pick some irrelevant function and add something like an asm directive that inserts a variable number of NOPs, and do a series of test runs with different values. See http://tinyurl.com/op9qg8a for an example of the kind of variation that one can get; this plot records timing runs where each different padding size was tested 3 times (non-consecutively, so you can see how repeatable the test result is for each size), each timing is actually the average of the last 10 of 11 consecutive runs of the test. To establish a 1% performance benefit or regression you need to show that there's still a difference _AFTER_ taking this kind of spooky-action-at-a-distance into account. For example, in the test shown at the link, if a substantive change to the code moved the upper and lower bounds of the output from (6091,6289) to (6030,6236) then one would be justified in claiming it as a 1% improvement. Such is the reality of modern CPUs. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > To get a reliable measurement of timing changes less than around 3%, > what you have to do is this: pick some irrelevant function and add > something like an asm directive that inserts a variable number of NOPs, > and do a series of test runs with different values. Good point. If you're looking at a microbenchmark that only exercises a small amount of code, it can be way worse than that. I was reminded of this the other day while fooling with the problem discussed in https://www.postgresql.org/message-id/flat/6970.1545327857@sss.pgh.pa.us in which we were spending huge amounts of time in a tight loop in match_eclasses_to_foreign_key_col. I normally run with --enable-cassert unless I'm trying to collect performance data; so I rebuilt with --disable-cassert, and was bemused to find out that that test case ran circa 20% *slower* in the non-debug build. This is silly on its face, and even more so when you notice that match_eclasses_to_foreign_key_col itself contains no Asserts and so its machine code is unchanged by the switch. (I went to the extent of comparing .s files to verify this.) So that had to have been down to alignment/cacheline issues triggered by moving said function around. I doubt the case would be exactly reproducible on different hardware or toolchain, but another platform would likely show similar issues on some case or other. tl;dr: even a 20% difference might be nothing more than an artifact. regards, tom lane
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Ramanarayana
		    Дата:
		        Hi,
        I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expression
SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
Cheers
Ram 4.0
Ram 4.0
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Pavel Stehule
		    Дата:
		        čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:
The other two issues are resolved by this patch.Hi,I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expressionSELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
what patches you are used? 
Regards
Pavel
--Cheers
Ram 4.0
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Ramanarayana
		    Дата:
		        Hi,
I applied the following patches
Can you let me know what fix is done in patch 002. I will test that as well?
Regards,
Ram.
On Thu, 28 Feb 2019 at 15:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:
čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:The other two issues are resolved by this patch.Hi,I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expressionSELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')what patches you are used?RegardsPavel--Cheers
Ram 4.0
Cheers
Ram 4.0
Ram 4.0
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Pavel Stehule
		    Дата:
		        čt 28. 2. 2019 v 10:49 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:
Hi,I applied the following patchesCan you let me know what fix is done in patch 002. I will test that as well?
I afraid so this patch set was not finished, and is not in current commitfest
please, check this set https://commitfest.postgresql.org/22/1872/
Regards
Pavel
Regards,Ram.On Thu, 28 Feb 2019 at 15:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:The other two issues are resolved by this patch.Hi,I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expressionSELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')what patches you are used?RegardsPavel--Cheers
Ram 4.0--Cheers
Ram 4.0
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Pavel Stehule
		    Дата:
		        čt 28. 2. 2019 v 10:31 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:The other two issues are resolved by this patch.Hi,I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expressionSELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')
I tested xmltable-xpath-result-processing-bugfix-6.patch
and it is working
postgres=# SELECT  xmltable.* 
postgres-# FROM (SELECT data FROM xmldata) x,
postgres-# LATERAL XMLTABLE('/ROWS/ROW'
postgres(# PASSING data
postgres(# COLUMNS
postgres(# country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
postgres(# size_text float PATH 'SIZE/text()',
postgres(# size_text_1 float PATH 'SIZE/text()[1]',
postgres(# size_text_2 float PATH 'SIZE/text()[2]',
postgres(# "SIZE" float, size_xml xml PATH 'SIZE') ;
┌──────────────┬───────────┬─────────────┬─────────────┬──────┬────────────────────────────┐
│ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │ size_xml │
╞══════════════╪═══════════╪═════════════╪═════════════╪══════╪════════════════════════════╡
│ Australia │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ China │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ HongKong │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ India │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Japan │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Singapore │ 791 │ 791 │ ∅ │ 791 │ <SIZE unit="km">791</SIZE> │
└──────────────┴───────────┴─────────────┴─────────────┴──────┴────────────────────────────┘
(6 rows)
postgres-# FROM (SELECT data FROM xmldata) x,
postgres-# LATERAL XMLTABLE('/ROWS/ROW'
postgres(# PASSING data
postgres(# COLUMNS
postgres(# country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
postgres(# size_text float PATH 'SIZE/text()',
postgres(# size_text_1 float PATH 'SIZE/text()[1]',
postgres(# size_text_2 float PATH 'SIZE/text()[2]',
postgres(# "SIZE" float, size_xml xml PATH 'SIZE') ;
┌──────────────┬───────────┬─────────────┬─────────────┬──────┬────────────────────────────┐
│ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │ size_xml │
╞══════════════╪═══════════╪═════════════╪═════════════╪══════╪════════════════════════════╡
│ Australia │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ China │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ HongKong │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ India │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Japan │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Singapore │ 791 │ 791 │ ∅ │ 791 │ <SIZE unit="km">791</SIZE> │
└──────────────┴───────────┴─────────────┴─────────────┴──────┴────────────────────────────┘
(6 rows)
Regards
Pavel
what patches you are used?RegardsPavel--Cheers
Ram 4.0
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 28 Feb 2019 at 09:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 0001 below does this.  I found a couple of places that could use
>> forfive(), as well.  I think this is a clear legibility and
>> error-proofing win, and we should just push it.
> I've looked over this and I agree that it's a good idea.  Reducing the
> number of lnext() usages seems like a good idea in order to reduce the
> footprint of the main patch.
I've pushed that; thanks for reviewing!
>> 0002 below does this.  I'm having a hard time deciding whether this
>> part is a good idea or just code churn.  It might be more readable
>> (especially to newbies) but I can't evaluate that very objectively.
> I'm less decided on this.
Yeah, I think I'm just going to drop that idea.  People didn't seem
very sold on list_cell_is_last() being a readability improvement,
and it certainly does nothing to reduce the footprint of the main
patch.
I now need to rebase the main patch over what I pushed; off to do
that next.
            regards, tom lane
			
		Here's a rebased version of the main patch.
David Rowley <david.rowley@2ndquadrant.com> writes:
> The only thing that I did to manage to speed the patch up was to ditch
> the additional NULL test in lnext().  I don't see why that's required
> since lnext(NULL) would have crashed with the old implementation.
I adopted this idea.  I think at one point where I was fooling with
different implementations for foreach(), it was necessary that lnext()
be cool with a NULL input; but as things stand now, it's not.
I haven't done anything else in the performance direction, but am
planning to play with that next.
I did run through all the list_delete_foo callers and fix the ones
that were still busted.  I also changed things so that with
DEBUG_LIST_MEMORY_USAGE enabled, list deletions would move the data
arrays around, in hopes of catching more stale-pointer problems.
Depressingly, check-world still passed with that added, even before
I'd fixed the bugs I found by inspection.  This does not speak well
for the coverage of our regression tests.
            regards, tom lane
			
		Вложения
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Ramanarayana
		    Дата:
		        Hi,
The below statement needs to be executed before running the query to replicate the issue
update xmldata set data = regexp_replace(data::text, '791', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml;  
On Thu, 28 Feb 2019 at 17:55, Pavel Stehule <pavel.stehule@gmail.com> wrote:
čt 28. 2. 2019 v 10:31 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana <raam.soft@gmail.com> napsal:The other two issues are resolved by this patch.Hi,I have tested the three issues fixed in patch 001. Array Indexes issue is still there.Running the following query returns ERROR: more than one value returned by column XPath expressionSELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')I tested xmltable-xpath-result-processing-bugfix-6.patchand it is workingpostgres=# SELECT xmltable.*
postgres-# FROM (SELECT data FROM xmldata) x,
postgres-# LATERAL XMLTABLE('/ROWS/ROW'
postgres(# PASSING data
postgres(# COLUMNS
postgres(# country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
postgres(# size_text float PATH 'SIZE/text()',
postgres(# size_text_1 float PATH 'SIZE/text()[1]',
postgres(# size_text_2 float PATH 'SIZE/text()[2]',
postgres(# "SIZE" float, size_xml xml PATH 'SIZE') ;
┌──────────────┬───────────┬─────────────┬─────────────┬──────┬────────────────────────────┐
│ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │ size_xml │
╞══════════════╪═══════════╪═════════════╪═════════════╪══════╪════════════════════════════╡
│ Australia │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ China │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ HongKong │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ India │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Japan │ ∅ │ ∅ │ ∅ │ ∅ │ ∅ │
│ Singapore │ 791 │ 791 │ ∅ │ 791 │ <SIZE unit="km">791</SIZE> │
└──────────────┴───────────┴─────────────┴─────────────┴──────┴────────────────────────────┘
(6 rows)RegardsPavelwhat patches you are used?RegardsPavel--Cheers
Ram 4.0
Cheers
Ram 4.0
Ram 4.0
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context
От
 
		    	Chapman Flack
		    Дата:
		        Hi, thanks for checking the patches! On 02/28/19 19:36, Ramanarayana wrote: > The below statement needs to be executed before running the query to > replicate the issue > > update xmldata set data = regexp_replace(data::text, '791', > '<!--ah-->7<!--oh-->9<!--uh-->1')::xml; If you are applying that update (and there is a SIZE element originally 791), and then receiving a "more than one value returned by column XPath expression" error, I believe you are seeing documented, correct behavior. Your update changes the content of that SIZE element to have three comment nodes and three text nodes. The query then contains this column spec: size_text float PATH 'SIZE/text()' where the target SQL column type is 'float' and the path expression will return an XML result consisting of the three text nodes. As documented, "An XML result assigned to a column of any other type may not have more than one node, or an error is raised." So I think this behavior is correct. If you do any more testing (thank you for taking the interest, by the way!), could you please add your comments, not to this email thread, but to [1]? [1] https://www.postgresql.org/message-id/3e8eab9e-7289-6c23-5e2c-153cccea2257%40anastigmatix.net That's the one that is registered to the commitfest entry, so comments made on this thread might be overlooked. Thanks! -Chap
Here's a v3 incorporating Andres' idea of trying to avoid a separate palloc for the list cell array. In a 64-bit machine we can have up to five ListCells in the initial allocation without actually increasing space consumption at all compared to the old code. So only when a List grows larger than that do we need more than one palloc. I'm still having considerable difficulty convincing myself that this is enough of a win to justify the bug hazards we'll introduce, though. On test cases like "pg_bench -S" it seems to be pretty much within the noise level of being the same speed as HEAD. I did see a nice improvement in the test case described in https://www.postgresql.org/message-id/6970.1545327857@sss.pgh.pa.us but considering that that's still mostly a tight loop in match_eclasses_to_foreign_key_col, it doesn't seem very interesting as an overall figure of merit. I wonder what test cases Andres has been looking at that convince him that we need a reimplementation of Lists. regards, tom lane
Вложения
Hi, On 2019-03-02 18:11:43 -0500, Tom Lane wrote: > I wonder what test cases Andres has been looking at that convince > him that we need a reimplementation of Lists. My main observation was from when the expression evaluation was using lists all over. List iteration overhead was very substantial there. But that's not a problem anymore, because all of those are gone now due to the expression rewrite. I personally wasn't actually advocating for a new list implementation, I was/am advocating that we should move some tasks over to a more optimized representation. I still regularly see list overhead matter in production workloads. A lot of it being memory allocator overhead, which is why I'm concerned with a rewrite that doesn't reduce the number of memory allocations. And a lot of it is stuff that you won't see in pgbench - e.g. there's a lot of production queries that join a bunch of tables with a few dozen columns, where e.g. all the targetlists are much longer than what you'd see in pgbench -S. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> On 2019-03-02 18:11:43 -0500, Tom Lane wrote:
>> I wonder what test cases Andres has been looking at that convince
>> him that we need a reimplementation of Lists.
> My main observation was from when the expression evaluation was using
> lists all over. List iteration overhead was very substantial there. But
> that's not a problem anymore, because all of those are gone now due to
> the expression rewrite.  I personally wasn't actually advocating for a
> new list implementation, I was/am advocating that we should move some
> tasks over to a more optimized representation.
I doubt that you'll get far with that; if this experiment is anything
to go by, it's going to be really hard to make the case that twiddling
the representation of widely-known data structures is worth the work
and bug hazards.
> I still regularly see list overhead matter in production workloads. A
> lot of it being memory allocator overhead, which is why I'm concerned
> with a rewrite that doesn't reduce the number of memory allocations.
Well, I did that in the v3 patch, and it still hasn't moved the needle
noticeably in any test case I've tried.  At this point I'm really
struggling to see a reason why we shouldn't just mark this patch rejected
and move on.  If you have test cases that suggest differently, please
show them don't just handwave.
The cases I've been looking at suggest to me that we'd make far
more progress on the excessive-palloc'ing front if we could redesign
things to reduce unnecessary copying of parsetrees.  Right now the
planner does an awful lot of copying because of fear of unwanted
modifications of multiply-linked subtrees.  I suspect that we could
reduce that overhead with some consistently enforced rules about
not scribbling on input data structures; but it'd take a lot of work
to get there, and I'm afraid it'd be easily broken :-(
            regards, tom lane
			
		On Mon, 4 Mar 2019 at 07:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > I still regularly see list overhead matter in production workloads. A > > lot of it being memory allocator overhead, which is why I'm concerned > > with a rewrite that doesn't reduce the number of memory allocations. > > Well, I did that in the v3 patch, and it still hasn't moved the needle > noticeably in any test case I've tried. At this point I'm really > struggling to see a reason why we shouldn't just mark this patch rejected > and move on. If you have test cases that suggest differently, please > show them don't just handwave. I think we discussed this before, but... if this patch is not a win by itself (and we've already seen it's not really causing much in the way of regression, if any), then we need to judge it on what else we can do to exploit the new performance characteristics of List. For example list_nth() is now deadly fast. My primary interest here is getting rid of a few places where we build an array version of some List so that we can access the Nth element more quickly. What goes on in ExecInitRangeTable() is not particularly great for queries to partitioned tables with a large number of partitions where only one survives run-time pruning. I've hacked together a patch to show you what wins we can have with the new list implementation. Using the attached, (renamed to .txt to not upset CFbot) I get: setup: create table hashp (a int, b int) partition by hash (a); select 'create table hashp'||x||' partition of hashp for values with (modulus 10000, remainder '||x||');' from generate_Series(0,9999) x; \gexec alter table hashp add constraint hashp_pkey PRIMARY KEY (a); postgresql.conf plan_cache_mode = force_generic_plan max_parallel_workers_per_gather=0 max_locks_per_transaction=256 bench.sql \set p random(1,10000) select * from hashp where a = :p; master: tps = 189.499654 (excluding connections establishing) tps = 195.102743 (excluding connections establishing) tps = 194.338813 (excluding connections establishing) your List reimplementation v3 + attached tps = 12852.003735 (excluding connections establishing) tps = 12791.834617 (excluding connections establishing) tps = 12691.515641 (excluding connections establishing) The attached does include [1], but even with just that the performance is not as good as with the arraylist plus the follow-on exploits I added. Now that we have a much faster bms_next_member() some form of what in there might be okay. A profile shows that in this workload we're still spending 42% of the 12k TPS in hash_seq_search(). That's due to LockReleaseAll() having a hard time of it due to the bloated lock table from having to build the generic plan with 10k partitions. [2] aims to fix that, so likely we'll be closer to 18k TPS, or about 100x faster. In fact, I should test that... tps = 18763.977940 (excluding connections establishing) tps = 18589.531558 (excluding connections establishing) tps = 19011.295770 (excluding connections establishing) Yip, about 100x. I think these are worthy goals to aspire to. [1] https://commitfest.postgresql.org/22/1897/ [2] https://commitfest.postgresql.org/22/1993/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Sun, Mar 3, 2019 at 1:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > My main observation was from when the expression evaluation was using > > lists all over. List iteration overhead was very substantial there. But > > that's not a problem anymore, because all of those are gone now due to > > the expression rewrite. I personally wasn't actually advocating for a > > new list implementation, I was/am advocating that we should move some > > tasks over to a more optimized representation. > > I doubt that you'll get far with that; if this experiment is anything > to go by, it's going to be really hard to make the case that twiddling > the representation of widely-known data structures is worth the work > and bug hazards. I'm befuddled by this comment. Andres is arguing that we shouldn't go do a blind search-and-replace, but rather change certain things, and you're saying that's going to be really hard because twiddling the representation of widely-known data structures is really hard. But if we only change certain things, we don't *need* to twiddle the representation of a widely-known data structure. We just add a new one and convert the things that benefit from it, like I proposed upthread (and promptly got told I was wrong). I think the reason why you're not seeing a performance benefit is because the problem is not that lists are generically a more expensive data structure than arrays, but that there are cases when they are more expensive than arrays. If you only ever push/pop at the front, of course a list is going to be better. If you often look up elements by index, of course an array is going to be better. If you change every case where the code currently uses a list to use something else instead, then you're changing both the winning and losing cases. Yeah, changing things individually is more work, but that's how you get the wins without incurring the losses. I think David's results go in this direction, too. Code that was written on the assumption that list_nth() is slow is going to avoid using it as much as possible, and therefore no benefit is to be expected from making it fast. If the author written the same code assuming that the underlying data structure was an array rather than a list, they might have picked a different algorithm which, as David's results shows, could be a lot faster in some cases. But it's not going to come from just changing the way lists work internally; it's going to come from redesigning the algorithms that are using lists to do something better instead, as Andres's example of linearized expression evaluation also shows. > The cases I've been looking at suggest to me that we'd make far > more progress on the excessive-palloc'ing front if we could redesign > things to reduce unnecessary copying of parsetrees. Right now the > planner does an awful lot of copying because of fear of unwanted > modifications of multiply-linked subtrees. I suspect that we could > reduce that overhead with some consistently enforced rules about > not scribbling on input data structures; but it'd take a lot of work > to get there, and I'm afraid it'd be easily broken :-( I think that's a separate but also promising thing to attack, and I agree that it'd take a lot of work to get there. I don't think that the problem with either parse-tree-copying or list usage is that no performance benefits are to be had; I think it's that the amount of work required to get those benefits is pretty large. If it were otherwise, somebody likely would have done it before now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> I think the reason why you're not seeing a performance benefit is
> because the problem is not that lists are generically a more expensive
> data structure than arrays, but that there are cases when they are
> more expensive than arrays.  If you only ever push/pop at the front,
> of course a list is going to be better.  If you often look up elements
> by index, of course an array is going to be better.  If you change
> every case where the code currently uses a list to use something else
> instead, then you're changing both the winning and losing cases.
I don't think this argument is especially on-point, because what I'm
actually seeing is just that there aren't any list operations that
are expensive enough to make much of an overall difference in
typical queries.  To the extent that an array reimplementation
reduces the palloc traffic, it'd take some load off that subsystem,
but apparently you need not-typical queries to really notice.
(And, if the real motivation is aggregate palloc savings, then yes you
really do want to replace everything...)
> Yeah, changing things individually is more work, but that's how you
> get the wins without incurring the losses.
The concern I have is mostly about the use of lists as core infrastructure
in parsetree, plantree, etc data structures.  I think any idea that we'd
replace those piecemeal is borderline insane: it's simply not worth it
from a notational and bug-risk standpoint to glue together some parts of
those structures differently from the way other parts are glued together.
            regards, tom lane
			
		Hi,
On 2019-03-02 18:11:43 -0500, Tom Lane wrote:
> On test cases like "pg_bench -S" it seems to be pretty much within the
> noise level of being the same speed as HEAD.
I think that might be because it's bottleneck is just elsewhere
(e.g. very context switch heavy, very few lists of any length).
FWIW, even just taking context switches out of the equation leads to
a ~5-6 %benefit in a simple statement:
DO $f$BEGIN FOR i IN 1..500000 LOOP EXECUTE $s$SELECT aid, bid, abalance, filler FROM pgbench_accounts WHERE aid =
2045530;$s$;ENDLOOP;END;$f$;
 
master:
+    6.05%  postgres  postgres            [.] AllocSetAlloc
+    5.52%  postgres  postgres            [.] base_yyparse
+    2.51%  postgres  postgres            [.] palloc
+    1.82%  postgres  postgres            [.] hash_search_with_hash_value
+    1.61%  postgres  postgres            [.] core_yylex
+    1.57%  postgres  postgres            [.] SearchCatCache1
+    1.43%  postgres  postgres            [.] expression_tree_walker.part.4
+    1.09%  postgres  postgres            [.] check_stack_depth
+    1.08%  postgres  postgres            [.] MemoryContextAllocZeroAligned
patch v3:
+    5.77%  postgres  postgres            [.] base_yyparse
+    4.88%  postgres  postgres            [.] AllocSetAlloc
+    1.95%  postgres  postgres            [.] hash_search_with_hash_value
+    1.89%  postgres  postgres            [.] core_yylex
+    1.64%  postgres  postgres            [.] SearchCatCache1
+    1.46%  postgres  postgres            [.] expression_tree_walker.part.0
+    1.45%  postgres  postgres            [.] palloc
+    1.18%  postgres  postgres            [.] check_stack_depth
+    1.13%  postgres  postgres            [.] MemoryContextAllocZeroAligned
+    1.04%  postgres  libc-2.28.so        [.] _int_malloc
+    1.01%  postgres  postgres            [.] nocachegetattr
And even just pgbenching the EXECUTEd statement above gives me a
reproducible ~3.5% gain when using -M simple, and ~3% when using -M
prepared.
Note than when not using prepared statement (a pretty important
workload, especially as long as we don't have a pooling solution that
actually allows using prepared statement across connections), even after
the patch most of the allocator overhead is still from list allocations,
but it's near exclusively just the "create a new list" case:
+    5.77%  postgres  postgres            [.] base_yyparse
-    4.88%  postgres  postgres            [.] AllocSetAlloc
   - 80.67% AllocSetAlloc
      - 68.85% AllocSetAlloc
         - 57.65% palloc
            - 50.30% new_list (inlined)
               - 37.34% lappend
                  + 12.66% pull_var_clause_walker
                  + 8.83% build_index_tlist (inlined)
                  + 8.80% make_pathtarget_from_tlist
                  + 8.73% get_quals_from_indexclauses (inlined)
                  + 8.73% distribute_restrictinfo_to_rels
                  + 8.68% RewriteQuery
                  + 8.56% transformTargetList
                  + 8.46% make_rel_from_joinlist
                  + 4.36% pg_plan_queries
                  + 4.30% add_rte_to_flat_rtable (inlined)
                  + 4.29% build_index_paths
                  + 4.23% match_clause_to_index (inlined)
                  + 4.22% expression_tree_mutator
                  + 4.14% transformFromClause
                  + 1.02% get_index_paths
               + 17.35% list_make1_impl
               + 16.56% list_make1_impl (inlined)
               + 15.87% lcons
               + 11.31% list_copy (inlined)
               + 1.58% lappend_oid
            + 12.90% expression_tree_mutator
            + 9.73% get_relation_info
            + 4.71% bms_copy (inlined)
            + 2.44% downcase_identifier
            + 2.43% heap_tuple_untoast_attr
            + 2.37% add_rte_to_flat_rtable (inlined)
            + 1.69% btbeginscan
            + 1.65% CreateTemplateTupleDesc
            + 1.61% core_yyalloc (inlined)
            + 1.59% heap_copytuple
            + 1.54% text_to_cstring (inlined)
            + 0.84% ExprEvalPushStep (inlined)
            + 0.84% ExecInitRangeTable
            + 0.84% scanner_init
            + 0.83% ExecInitRangeTable
            + 0.81% CreateQueryDesc
            + 0.81% _bt_search
            + 0.77% ExecIndexBuildScanKeys
            + 0.66% RelationGetIndexScan
            + 0.65% make_pathtarget_from_tlist
Given how hard it is to improve performance with as flatly distributed
costs as the above profiles, I actually think these are quite promising
results.
I'm not even convinced that it makes all that much sense to measure
end-to-end performance here, it might be worthwhile to measure with a
debugging function that allows to exercise parsing, parse-analysis,
rewrite etc at configurable loop counts. Given the relatively evenly
distributed profiles were going to have to make a few different
improvements to make headway, and it's hard to see benefits of
individual ones if you look at the overall numbers.
Greetings,
Andres Freund
			
		Hi, On 2019-03-03 13:29:04 -0500, Tom Lane wrote: > The cases I've been looking at suggest to me that we'd make far > more progress on the excessive-palloc'ing front if we could redesign > things to reduce unnecessary copying of parsetrees. Right now the > planner does an awful lot of copying because of fear of unwanted > modifications of multiply-linked subtrees. I suspect that we could > reduce that overhead with some consistently enforced rules about > not scribbling on input data structures; but it'd take a lot of work > to get there, and I'm afraid it'd be easily broken :-( Given the difficulty of this tasks, isn't your patch actually a *good* attack on the problem? It makes copying lists considerably cheaper. As you say, a more principled answer to this problem is hard, so attacking it from the "make the constant factors smaller" side doesn't seem crazy? Greetings, Andres Freund
Hi, On 2019-03-04 13:11:35 -0500, Tom Lane wrote: > The concern I have is mostly about the use of lists as core infrastructure > in parsetree, plantree, etc data structures. I think any idea that we'd > replace those piecemeal is borderline insane: it's simply not worth it > from a notational and bug-risk standpoint to glue together some parts of > those structures differently from the way other parts are glued together. I don't buy this. I think e.g. redisgning the way we represent targetlists would be good (it's e.g. insane that we recompute descriptors out of them all the time), and would reduce their allocator costs. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> I don't buy this. I think e.g. redisgning the way we represent
> targetlists would be good (it's e.g. insane that we recompute
> descriptors out of them all the time), and would reduce their allocator
> costs.
Maybe we're not on the same page here, but it seems to me that that'd be
addressable with pretty localized changes (eg, adding more fields to
TargetEntry, or keeping a pre-instantiated output tupdesc in each Plan
node).  But if the concern is about the amount of palloc bandwidth going
into List cells, we're not going to be able to improve that with localized
data structure changes; it'll take something like the patch I've proposed.
I *have* actually done some tests of the sort you proposed, driving
just the planner and not any of the rest of the system, but I still
didn't find much evidence of big wins.  I find it interesting that
you get different results.
            regards, tom lane
			
		On Mon, Mar 4, 2019 at 12:44:41PM -0500, Robert Haas wrote: > I think that's a separate but also promising thing to attack, and I > agree that it'd take a lot of work to get there. I don't think that > the problem with either parse-tree-copying or list usage is that no > performance benefits are to be had; I think it's that the amount of > work required to get those benefits is pretty large. If it were > otherwise, somebody likely would have done it before now. Stupid question, but do we use any kind of reference counter to know if two subsystems look at a structure, and a copy is required? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Mar 4, 2019 at 01:11:35PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I think the reason why you're not seeing a performance benefit is > > because the problem is not that lists are generically a more expensive > > data structure than arrays, but that there are cases when they are > > more expensive than arrays. If you only ever push/pop at the front, > > of course a list is going to be better. If you often look up elements > > by index, of course an array is going to be better. If you change > > every case where the code currently uses a list to use something else > > instead, then you're changing both the winning and losing cases. > > I don't think this argument is especially on-point, because what I'm > actually seeing is just that there aren't any list operations that > are expensive enough to make much of an overall difference in > typical queries. To the extent that an array reimplementation > reduces the palloc traffic, it'd take some load off that subsystem, > but apparently you need not-typical queries to really notice. > (And, if the real motivation is aggregate palloc savings, then yes you > really do want to replace everything...) Could it be that allocating List* structures near the structure it points to is enough of a benefit in terms of cache hits that it is a loss when moving to a List* array? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Mar 4, 2019 at 2:04 PM Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Mar 4, 2019 at 12:44:41PM -0500, Robert Haas wrote: > > I think that's a separate but also promising thing to attack, and I > > agree that it'd take a lot of work to get there. I don't think that > > the problem with either parse-tree-copying or list usage is that no > > performance benefits are to be had; I think it's that the amount of > > work required to get those benefits is pretty large. If it were > > otherwise, somebody likely would have done it before now. > > Stupid question, but do we use any kind of reference counter to know if > two subsystems look at a structure, and a copy is required? No, but I wonder if we could use Valgrind to enforce rules about who has the right to scribble on what, when. That could make it a lot easier to impose a new rule. -- Peter Geoghegan
Hi, On 2019-03-04 16:28:40 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I don't buy this. I think e.g. redisgning the way we represent > > targetlists would be good (it's e.g. insane that we recompute > > descriptors out of them all the time), and would reduce their allocator > > costs. > > Maybe we're not on the same page here, but it seems to me that that'd be > addressable with pretty localized changes (eg, adding more fields to > TargetEntry, or keeping a pre-instantiated output tupdesc in each Plan > node). But if the concern is about the amount of palloc bandwidth going > into List cells, we're not going to be able to improve that with localized > data structure changes; it'll take something like the patch I've proposed. What I'm saying is that it'd be reasonable to replace the use of list for targetlists with 'list2' without a wholesale replacement of all the list code, and it'd give us benefits. > I find it interesting that you get different results. What I reported weren't vanilla pgbench -S results, so there's that difference. If measure the DO loop based test I posted, do you see a difference? Greetings, Andres Freund
On Tue, 5 Mar 2019 at 11:11, Andres Freund <andres@anarazel.de> wrote: > On 2019-03-04 16:28:40 -0500, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I don't buy this. I think e.g. redisgning the way we represent > > > targetlists would be good (it's e.g. insane that we recompute > > > descriptors out of them all the time), and would reduce their allocator > > > costs. > > > > Maybe we're not on the same page here, but it seems to me that that'd be > > addressable with pretty localized changes (eg, adding more fields to > > TargetEntry, or keeping a pre-instantiated output tupdesc in each Plan > > node). But if the concern is about the amount of palloc bandwidth going > > into List cells, we're not going to be able to improve that with localized > > data structure changes; it'll take something like the patch I've proposed. > > What I'm saying is that it'd be reasonable to replace the use of list > for targetlists with 'list2' without a wholesale replacement of all the > list code, and it'd give us benefits. So you think targetlists are the only case to benefit from an array based list? (Ignoring the fact that I already showed another case) When we discover the next thing to benefit, then the replacement will be piecemeal, just the way Tom would rather not do it. I personally don't want to be up against huge resistance when I discover that turning a single usage of a List into List2 is better. We'll need to consider backpatching pain / API breakage *every single time*. A while ago I did have a go at changing some List implementations for my then proposed ArrayList and it was beyond a nightmare, as each time I changed one I realised I needed to change another. In the end, I just gave up. Think of all the places we have forboth() and forthree(), we'll need to either provide a set of macros that take various combinations of List and List2 or do some conversion beforehand. With respect, if you don't believe me, please take my ArrayList patch [1] and have a go at changing targetlists to use ArrayLists all the way from the parser through to the executor. I'll be interested in the diff stat once you're done. It's true that linked lists are certainly better for some stuff; list_concat() is going to get slower, lcons() too, but likely we can have a bonus lcons() elimination round at some point. I see quite a few of them that look like they could be changed to lappend(). I also just feel that if we insist on more here then we'll get about nothing. I'm also blocked on my partition performance improvement goals on list_nth() being O(N), so I'm keen to see progress here and do what I can to help with that. With list_concat() I find that pretty scary anyway. Using it means we can have a valid list that does not get it's length updated when someone appends a new item. Most users of that do list_copy() to sidestep that and other issues... which likely is something we'd want to rip out with Tom's patch. [1] https://www.postgresql.org/message-id/CAKJS1f_2SnXhPVa6eWjzy2O9A=ocwgd0Cj-LQeWpGtrWqbUSDA@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019-03-05 12:42:47 +1300, David Rowley wrote: > So you think targetlists are the only case to benefit from an array > based list? (Ignoring the fact that I already showed another case) No, that's not what I'm trying to say at all. I think there's plenty cases where it'd be beneficial. In this subthread we're just arguing whether it's somewhat feasible to not change everything, and I'm still fairly convinced that's possible; but I'm not arguing that that's the best way. > It's true that linked lists are certainly better for some stuff; > list_concat() is going to get slower, lcons() too, but likely we can > have a bonus lcons() elimination round at some point. I see quite a > few of them that look like they could be changed to lappend(). I also > just feel that if we insist on more here then we'll get about nothing. > I'm also blocked on my partition performance improvement goals on > list_nth() being O(N), so I'm keen to see progress here and do what I > can to help with that. With list_concat() I find that pretty scary > anyway. Using it means we can have a valid list that does not get it's > length updated when someone appends a new item. Most users of that do > list_copy() to sidestep that and other issues... which likely is > something we'd want to rip out with Tom's patch. Yes, I think you have a point that progress here would be good and that it's worth some pain. But the names will make even less sense if we just shunt in an array based approach under the already obscure list API. Obviously the individual pain of that is fairly small, but over the years and everybody reading PG code, it's also substantial. So I'm torn. Greetings, Andres Freund
On Tue, 5 Mar 2019 at 12:54, Andres Freund <andres@anarazel.de> wrote: > Yes, I think you have a point that progress here would be good and that > it's worth some pain. But the names will make even less sense if we just > shunt in an array based approach under the already obscure list > API. If we feel strongly about fixing that then probably it would be as simple as renaming the functions and adding some macros with the old names and insisting that all new or changed code use the functions and not the macro wrappers. That could be followed up by a final sweep in N years time when the numbers have dwindled to a low enough level. All that code mustn't be getting modified anyway, so not much chance backpatching pain. I see length() finally died in a similar way in Tom's patch. Perhaps doing this would have people consider lcons more carefully before they use it over lappend. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> ...  With list_concat() I find that pretty scary
> anyway. Using it means we can have a valid list that does not get it's
> length updated when someone appends a new item. Most users of that do
> list_copy() to sidestep that and other issues... which likely is
> something we'd want to rip out with Tom's patch.
Yeah, it's a bit OT for this patch, but I'd noticed the prevalence of
locutions like list_concat(list_copy(list1), list2), and been thinking
of proposing that we add some new primitives with, er, less ad-hoc
behavior.  The patch at hand already changes the semantics of list_concat
in a somewhat saner direction, but I think there is room for a version
of list_concat that treats both its inputs as const Lists.
            regards, tom lane
			
		David Rowley <david.rowley@2ndquadrant.com> writes:
> On Tue, 5 Mar 2019 at 12:54, Andres Freund <andres@anarazel.de> wrote:
>> Yes, I think you have a point that progress here would be good and that
>> it's worth some pain. But the names will make even less sense if we just
>> shunt in an array based approach under the already obscure list
>> API.
> If we feel strongly about fixing that then probably it would be as
> simple as renaming the functions and adding some macros with the old
> names and insisting that all new or changed code use the functions and
> not the macro wrappers.
Meh ... Neil Conway already did a round of that back in 2004 or whenever,
and I'm not especially excited about another round.  I'm not really
following Andres' aversion to the list API --- it's not any more obscure
than a whole lot of things in Postgres.  (Admittedly, as somebody who
dabbled in Lisp decades ago, I might be more used to it than some folks.)
            regards, tom lane
			
		Here's a new version of the Lists-as-arrays patch.  It's rebased up to
HEAD, and I also realized that I could fix the problem with multiple
evaluation of the List arguments of foreach etc. by using structure
assignment.  So that gets rid of a large chunk of the semantic gotchas
that were in the previous patch.  You still have to be careful about
code that deletes list entries within a foreach() over the list ---
but nearly all such code is using list_delete_cell, which means
you'll have to touch it anyway because of the API change for that
function.
Previously, the typical logic for deletion-within-a-loop involved
either advancing or not advancing a "prev" pointer that was used
with list_delete_cell.  The way I've recoded that here changes those
loops to use an integer list index that gets incremented or not.
Now, it turns out that the new formulation of foreach() is really
strictly equivalent to
    for (int pos = 0; pos < list_length(list); pos++)
    {
        whatever-type item = list_nth(list, pos);
        ...
    }
which means that it could cope fine with deletion of the current
list element if we were to provide some supported way of not
incrementing the list index counter.  That is, instead of
code that looks more or less like this:
    for (int pos = 0; pos < list_length(list); pos++)
    {
        whatever-type item = list_nth(list, pos);
        ...
        if (delete_cur)
        {
            list = list_delete_nth_cell(list, pos);
            pos--;   /* keep loop in sync with deletion */
        }
    }
we could write, say:
    foreach(lc, list)
    {
        whatever-type item = lfirst(lc);
        ...
        if (delete_cur)
        {
            list = list_delete_cell(list, lc);
            foreach_backup(lc); /* keep loop in sync with deletion */
        }
    }
which is the same thing under the hood.  I'm not quite sure if that way
is better or not.  It's more magical than explicitly manipulating a list
index, but it's also shorter and therefore less subject to typos.
            regards, tom lane
			
		Вложения
On Sat, 25 May 2019 at 12:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now, it turns out that the new formulation of foreach() is really
> strictly equivalent to
>
>         for (int pos = 0; pos < list_length(list); pos++)
>         {
>                 whatever-type item = list_nth(list, pos);
>                 ...
>         }
>
> which means that it could cope fine with deletion of the current
> list element if we were to provide some supported way of not
> incrementing the list index counter.  That is, instead of
> code that looks more or less like this:
>
>         for (int pos = 0; pos < list_length(list); pos++)
>         {
>                 whatever-type item = list_nth(list, pos);
>                 ...
>                 if (delete_cur)
>                 {
>                         list = list_delete_nth_cell(list, pos);
>                         pos--;   /* keep loop in sync with deletion */
>                 }
>         }
>
> we could write, say:
>
>         foreach(lc, list)
>         {
>                 whatever-type item = lfirst(lc);
>                 ...
>                 if (delete_cur)
>                 {
>                         list = list_delete_cell(list, lc);
>                         foreach_backup(lc); /* keep loop in sync with deletion */
>                 }
>         }
>
> which is the same thing under the hood.  I'm not quite sure if that way
> is better or not.  It's more magical than explicitly manipulating a list
> index, but it's also shorter and therefore less subject to typos.
If we're doing an API break for this, wouldn't it be better to come up
with something that didn't have to shuffle list elements around every
time one is deleted?
For example, we could have a foreach_delete() that instead of taking a
pointer to a ListCell, it took a ListDeleteIterator which contained a
ListCell pointer and a Bitmapset, then just have a macro that marks a
list item as deleted (list_delete_current(di)) and have a final
cleanup at the end of the loop.
The cleanup operation can still use memmove, but just only move up
until the next bms_next_member on the deleted set, something like
(handwritten and untested):
void
list_finalize_delete(List *list, ListDeleteIterator *di)
{
    int srcpos, curr, tarpos;
    /* Zero the source and target list position markers */
    srcpos = tarpos = 0;
    curr = -1;
    while ((curr = bms_next_member(di->deleted, curr) >= 0)
    {
        int n = curr - srcpos;
        if (n > 0)
        {
            memmove(&list->elements[tarpos], &list->elements[srcpos],
                                n * sizeof(ListCell));
            tarpos += n;
        }
        srcpos = curr + 1;
    }
    list->length = tarpos;
}
Or maybe we should worry about having the list in an inconsistent
state during the loop?  e.g if the list is getting passed into a
function call to do something.
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
			
		I wrote:
> Here's a new version of the Lists-as-arrays patch.
The cfbot noticed a set-but-not-used variable that my compiler hadn't
whined about.  Here's a v5 to pacify it.  No other changes.
            regards, tom lane
			
		Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> If we're doing an API break for this, wouldn't it be better to come up
> with something that didn't have to shuffle list elements around every
> time one is deleted?
Agreed that as long as there's an API break anyway, we could consider
more extensive changes for this use-case.  But ...
> For example, we could have a foreach_delete() that instead of taking a
> pointer to a ListCell, it took a ListDeleteIterator which contained a
> ListCell pointer and a Bitmapset, then just have a macro that marks a
> list item as deleted (list_delete_current(di)) and have a final
> cleanup at the end of the loop.
... I'm not quite sold on this particular idea.  The amount of added
bitmapset manipulation overhead seems rather substantial in comparison
to the memmove work saved.  It might win for cases involving very
long lists with many entries being deleted in one operation, but
I don't think that's a common scenario for us.  It's definitely a
loss when there's just one item to be deleted, which I think is a
common case.  (Of course, callers expecting that could just not
use this multi-delete API.)
> Or maybe we should worry about having the list in an inconsistent
> state during the loop?  e.g if the list is getting passed into a
> function call to do something.
Not following that?  If I understand your idea correctly, the list
doesn't actually get changed until the cleanup step.  If we pass it to
another operation that independently deletes some members meanwhile,
that's trouble; but it'd be trouble for the existing code, and for my
version of the patch too.
FWIW, I don't really see a need to integrate this idea into the
loop logic as such.  You could just define it as "make a bitmap
of the list indexes to delete, then call
"list = list_delete_multi(list, bitmapset)".  It would be
helpful perhaps if we provided official access to the current
list index that the foreach macro is maintaining internally.
            regards, tom lane
			
		On Sat, May 25, 2019 at 11:48:47AM -0400, Tom Lane wrote: > I wrote: > > Here's a new version of the Lists-as-arrays patch. > > The cfbot noticed a set-but-not-used variable that my compiler hadn't > whined about. Here's a v5 to pacify it. No other changes. Have you tested the performance impact? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, 14 Jun 2019 at 13:54, Bruce Momjian <bruce@momjian.us> wrote: > > On Sat, May 25, 2019 at 11:48:47AM -0400, Tom Lane wrote: > > I wrote: > > > Here's a new version of the Lists-as-arrays patch. > > > > The cfbot noticed a set-but-not-used variable that my compiler hadn't > > whined about. Here's a v5 to pacify it. No other changes. > > Have you tested the performance impact? I did some and posted earlier in the thread: https://postgr.es/m/CAKJS1f8h2vs8M0cgFsgfivfkjvudU5-MZO1gJB2uf0m8_9VCpQ@mail.gmail.com It came out only slightly slower over the whole regression test run, which I now think is surprisingly good considering how much we've tuned the code over the years with the assumption that List is a singly linked list. We'll be able to get rid of things like PlannerInfo's simple_rte_array and append_rel_array along with EState's es_range_table_array. I'm particularly happy about getting rid of es_range_table_array since initialising a plan with many partitions ends up costing quite a bit just to build that array. Run-time pruning might end up pruning all but one of those, so getting rid of something that's done per partition is pretty good. (There's also the locking, but that's another problem). -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Fri, 14 Jun 2019 at 13:54, Bruce Momjian <bruce@momjian.us> wrote: >> Have you tested the performance impact? > I did some and posted earlier in the thread: > https://postgr.es/m/CAKJS1f8h2vs8M0cgFsgfivfkjvudU5-MZO1gJB2uf0m8_9VCpQ@mail.gmail.com > It came out only slightly slower over the whole regression test run, > which I now think is surprisingly good considering how much we've > tuned the code over the years with the assumption that List is a > singly linked list. We'll be able to get rid of things like > PlannerInfo's simple_rte_array and append_rel_array along with > EState's es_range_table_array. Yeah. I have not made any attempt at all in the current patch to re-tune the code, or clean up places that are maintaining parallel Lists and arrays (such as the ones David mentions). So it's not entirely fair to take the current state of the patch as representative of where performance would settle once we've bought into the new method. regards, tom lane
Hi, On 5/25/19 11:48 AM, Tom Lane wrote: > The cfbot noticed a set-but-not-used variable that my compiler hadn't > whined about. Here's a v5 to pacify it. No other changes. > This needs a rebase. After that check-world passes w/ and w/o -DDEBUG_LIST_MEMORY_USAGE. There is some unneeded MemoryContext stuff in async.c's pg_listening_channels() which should be cleaned up. Thanks for working on this, as the API is more explicit now about what is going on. Best regards, Jesper
Jesper Pedersen <jesper.pedersen@redhat.com> writes:
> This needs a rebase. After that check-world passes w/ and w/o 
> -DDEBUG_LIST_MEMORY_USAGE.
Yup, here's a rebase against HEAD (and I also find that check-world shows
no problems).  This is pretty much of a pain to maintain, since it changes
the API for lnext() which is, um, a bit invasive.  I'd like to make a
decision pretty quickly on whether we're going to do this, and either
commit this patch or abandon it.
> There is some unneeded MemoryContext stuff in async.c's 
> pg_listening_channels() which should be cleaned up.
Yeah, there's a fair amount of follow-on cleanup that could be undertaken
afterwards, but I've wanted to keep the patch's footprint as small as
possible for the moment.  Assuming we pull the trigger, I'd then go look
at removing the planner's duplicative lists+arrays for RTEs and such as
the first cleanup step.  But thanks for the pointer to async.c, I'll
check that too.
            regards, tom lane
			
		Вложения
Hi, On 7/1/19 2:44 PM, Tom Lane wrote: > Yup, here's a rebase against HEAD (and I also find that check-world shows > no problems). Thanks - no further comments. > This is pretty much of a pain to maintain, since it changes > the API for lnext() which is, um, a bit invasive. I'd like to make a > decision pretty quickly on whether we're going to do this, and either > commit this patch or abandon it. > IMHO it is an improvement over the existing API. >> There is some unneeded MemoryContext stuff in async.c's >> pg_listening_channels() which should be cleaned up. > > Yeah, there's a fair amount of follow-on cleanup that could be undertaken > afterwards, but I've wanted to keep the patch's footprint as small as > possible for the moment. Assuming we pull the trigger, I'd then go look > at removing the planner's duplicative lists+arrays for RTEs and such as > the first cleanup step. But thanks for the pointer to async.c, I'll > check that too. > Yeah, I only called out the async.c change, as that function isn't likely to change in any of the follow up patches. Best regards, Jesper
I spent some time experimenting with the idea mentioned upthread of
adding a macro to support deletion of a foreach loop's current element
(adjusting the loop's state behind the scenes).  This turns out to work
really well: it reduces the complexity of fixing existing loops around
element deletions quite a bit.  Whereas in existing code you have to not
use foreach() at all, and you have to track both the next list element and
the previous undeleted element, now you can use foreach() and you don't
have to mess with extra variables at all.
A good example appears in the trgm_regexp.c changes below.  Typically
we've coded such loops with a handmade expansion of foreach, like
    prev = NULL;
    cell = list_head(state->enterKeys);
    while (cell)
    {
        TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
        next = lnext(cell);
        if (need to delete)
            state->enterKeys = list_delete_cell(state->enterKeys,
                            cell, prev);
        else
            prev = cell;
        cell = next;
    }
My previous patch would have had you replace this with a loop using
an integer list-position index.  You can still do that if you like,
but it's less change to convert the loop to a foreach(), drop the
prev/next variables, and replace the list_delete_cell call with
foreach_delete_current:
    foreach(cell, state->enterKeys)
    {
        TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
        if (need to delete)
            state->enterKeys = foreach_delete_current(state->enterKeys,
                                cell);
    }
So I think this is a win, and attached is v7.
            regards, tom lane
			
		Вложения
On Tue, Jul 2, 2019 at 1:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I think this is a win, and attached is v7.
Not related to the diff v6..v7, but shouldn't we throw additionally a memset() with '\0' before calling pfree():
+
+ newelements = (ListCell *)
+ MemoryContextAlloc(GetMemoryChunkContext(list),
+ new_max_len * sizeof(ListCell));
+ memcpy(newelements, list->elements,
+ list->length * sizeof(ListCell));
+ pfree(list->elements);
+ list->elements = newelements;
Or is this somehow ensured by debug pfree() implementation or does it work differently together with Valgrind?
Otherwise it seems that the calling code can still be hanging onto a list element from a freed chunk and (rather) happily accessing it, as opposed to almost ensured crash if that is zeroed before returning from enlarge_list().
--
Alex
Oleksandr Shulgin <oleksandr.shulgin@zalando.de> writes:
> Not related to the diff v6..v7, but shouldn't we throw additionally a
> memset() with '\0' before calling pfree():
I don't see the point of that.  In debug builds CLOBBER_FREED_MEMORY will
take care of it, and in non-debug builds I don't see why we'd expend
the cycles.
            regards, tom lane
			
		On Tue, 2 Jul 2019 at 11:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My previous patch would have had you replace this with a loop using
> an integer list-position index.  You can still do that if you like,
> but it's less change to convert the loop to a foreach(), drop the
> prev/next variables, and replace the list_delete_cell call with
> foreach_delete_current:
>
>         foreach(cell, state->enterKeys)
>         {
>                 TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
>
>                 if (need to delete)
>                         state->enterKeys = foreach_delete_current(state->enterKeys,
>                                                                 cell);
>         }
>
> So I think this is a win, and attached is v7.
It's pretty nice to get rid of those. I also like you've handled the
changes in SRFs.
I've now read over the entire patch and have noted down the following:
1. MergeAttributes() could do with a comment mention why you're not
using foreach() on the outer loop. I almost missed the
list_delete_nth_cell() call that's a few branches deep in the outer
loop.
2. In expandTupleDesc(), couldn't you just change the following:
int i;
for (i = 0; i < offset; i++)
{
if (aliascell)
aliascell = lnext(eref->colnames, aliascell);
}
to:
aliascell = offset < list_length(eref->colnames) ?
list_nth_cell(eref->colnames, offset) : NULL;
3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ?
4. Do you think it would be a good idea to document that the 'pos' arg
in list_insert_nth and co must be <= list_length(). I know you've
mentioned that in insert_new_cell, but that's static and
list_insert_nth is not. I think it would be better not to have to
chase down comments of static functions to find out how to use an
external function.
5. Why does enlarge_list() return List *? Can't it just return void?
I noticed this after looking at add_new_cell_after() and reading your
cautionary comment and then looking at lappend_cell(). At first, it
seemed that lappend_cell() could end up reallocating List to make way
for the new cell, but from looking at enlarge_list() it seems we
always maintain the original allocation of the header. So why bother
returning List * in that function?
6. Is there a reason to use memmove() in list_concat() rather than
just memcpy()? I don't quite believe the comment you've written. As
far as I can see you're not overwriting any useful memory so the order
of the copy should not matter.
7. The last part of the following comment might not age well.
/*
* Note: unlike the individual-list-cell deletion functions, we never make
* any effort to move the surviving list cells to new storage.  This is
* because none of them can move in this operation, so it's the same as
* the old implementation in terms of what callers may assume.
*/
The old comment about extending the list seems more fitting.
9. I see your XXX comment in list_qsort(), but wouldn't it be better
to just invent list_qsort_internal() as a static function and just
have it qsort the list in-place, then make list_qsort just return
list_qsort_internal(list_copy(list)); and keep the XXX comment so that
the fixup would just remove the list_copy()? That way, when it comes
to fixing that inefficiency we can just cut and paste the internal
implementation into list_qsort(). It'll be much less churn, especially
if you put the internal version directly below the external one.
10. I wonder if we can reduce a bit of pain for extension authors by
back patching a macro that wraps up a lnext() call adding a dummy
argument for the List.  That way they don't have to deal with their
own pre-processor version dependent code. Downsides are we'd need to
keep the macro into the future, however, it's just 1 line of code...
I also did some more benchmarking of the patch. Basically, I patched
with the attached file (converted to .txt not to upset the CFbot) then
ran make installcheck. This was done on an AWS m5d.large instance.
The patch runs the planner 10 times then LOGs the average time of
those 10 runs. Taking the sum of those averages I see:
Master: 0.482479 seconds
Patched: 0.471949 seconds
Which makes the patched version 2.2% faster than master on that run.
I've resisted attaching the spreadsheet since there are almost 22k
data points per run.
Apart from the 10 points above, I think the patch is good to go.
I also agree with keeping the further improvements like getting rid of
the needless list_copy() in list_concat() calls as a followup patch. I
also agree with Tom about moving quickly with this one.  Reviewing it
in detail took me a long time, I'd really rather not do it again after
leaving it to rot for a while.
-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
			
		Вложения
On Tue, Jul 2, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleksandr Shulgin <oleksandr.shulgin@zalando.de> writes:
> Not related to the diff v6..v7, but shouldn't we throw additionally a
> memset() with '\0' before calling pfree():
I don't see the point of that. In debug builds CLOBBER_FREED_MEMORY will
take care of it, and in non-debug builds I don't see why we'd expend
the cycles.
This is what I was wondering about, thanks for providing a reference.
--
Alex
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've now read over the entire patch and have noted down the following:
Thanks for the review!  Attached is a v8 responding to most of your
comments.  Anything not quoted below I just accepted.
> 1. MergeAttributes() could do with a comment mention why you're not
> using foreach() on the outer loop.
Check.  I also got rid of the Assert added by f7e954ad1, as it seems
not to add any clarity in view of this comment.
> 2. In expandTupleDesc(), couldn't you just change the following:
Done, although this seems like the sort of follow-on optimization
that I wanted to leave for later.
> 3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ?
I don't think so.  They're internal functions, and anyway they'll
crash very handily on a NIL pointer.
> 5. Why does enlarge_list() return List *? Can't it just return void?
Also done.  I had had some idea of maintaining flexibility, but
considering that we still need the property that a List's header never
moves (as long as it stays nonempty), there's no circumstance where
enlarge_list could validly move the header.
> 9. I see your XXX comment in list_qsort(), but wouldn't it be better
> to just invent list_qsort_internal() as a static function and just
> have it qsort the list in-place, then make list_qsort just return
> list_qsort_internal(list_copy(list)); and keep the XXX comment so that
> the fixup would just remove the list_copy()?
I don't really see the point of doing more than the minimum possible
work on list_qsort in this patch.  The big churn from changing it
is going to be in adjusting the callers' comparator functions for one
less level of indirection, and I'd just as soon rewrite list_qsort
in that patch not this one.
> 10. I wonder if we can reduce a bit of pain for extension authors by
> back patching a macro that wraps up a lnext() call adding a dummy
> argument for the List.
I was wondering about a variant of that yesterday; specifically,
I thought of naming the new 2-argument function list_next() not lnext().
Then we could add "#define list_next(l,c) lnext(c)" in the back branches.
This would simplify back-patching code that used the new definition, and
it might save some effort for extension authors who are trying to maintain
cross-branch code.  On the other hand, it's more keystrokes forevermore,
and I'm not entirely convinced that code that's using lnext() isn't likely
to need other adjustments anyway.  So I didn't pull the trigger on that,
but if people like the idea I'd be okay with doing it like that.
> I also agree with keeping the further improvements like getting rid of
> the needless list_copy() in list_concat() calls as a followup patch. I
> also agree with Tom about moving quickly with this one.  Reviewing it
> in detail took me a long time, I'd really rather not do it again after
> leaving it to rot for a while.
Indeed.  I don't want to expend a lot of effort keeping it in sync
with master over a long period, either.  Opinions?
            regards, tom lane
			
		Вложения
On 2019-Jul-03, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > 10. I wonder if we can reduce a bit of pain for extension authors by > > back patching a macro that wraps up a lnext() call adding a dummy > > argument for the List. > > I was wondering about a variant of that yesterday; specifically, > I thought of naming the new 2-argument function list_next() not lnext(). > Then we could add "#define list_next(l,c) lnext(c)" in the back branches. > This would simplify back-patching code that used the new definition, and > it might save some effort for extension authors who are trying to maintain > cross-branch code. On the other hand, it's more keystrokes forevermore, > and I'm not entirely convinced that code that's using lnext() isn't likely > to need other adjustments anyway. So I didn't pull the trigger on that, > but if people like the idea I'd be okay with doing it like that. I was thinking about this issue too earlier today, and my conclusion is that the way you have it in v7 is fine, because lnext() callsites are not that numerous, so the cost to third-party code authors is not that high; the other arguments trump this consideration IMO. I say this as someone who curses every time he has to backpatch things across the heap_open / table_open change -- but there are a lot more calls of that. > Indeed. I don't want to expend a lot of effort keeping it in sync > with master over a long period, either. Opinions? Yeah, let's get it done soon. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> I also did some more benchmarking of the patch. ...
> Which makes the patched version 2.2% faster than master on that run.
BTW, further on the subject of performance --- I'm aware of at least
these topics for follow-on patches:
* Fix places that are maintaining arrays parallel to Lists for
access-speed reasons (at least simple_rte_array, append_rel_array,
es_range_table_array).
* Look at places using lcons/list_delete_first to maintain FIFO lists.
The patch makes these O(N^2) for long lists.  If we can reverse the list
order and use lappend/list_truncate instead, it'd be better.  Possibly in
some places the list ordering is critical enough to make this impractical,
but I suspect it's an easy win in most.
* Rationalize places that are using combinations of list_copy and
list_concat, probably by inventing an additional list-concatenation
primitive that modifies neither input.
* Adjust API for list_qsort(), as discussed, to save indirections and
avoid constructing an intermediate pointer array.  I also seem to recall
one place in the planner that's avoiding using list_qsort by manually
flattening a list into an array, qsort'ing, and rebuilding the list :-(
I don't think that any one of these fixes would move the needle very
much on "typical" simple workloads, but it's reasonable to hope that in
aggregate they'd make for a noticeable improvement.  In the meantime,
I'm gratified that the initial patch at least doesn't seem to have lost
any ground.
            regards, tom lane
			
		On Thu, 4 Jul 2019 at 06:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > I've now read over the entire patch and have noted down the following: > > Thanks for the review! Attached is a v8 responding to most of your > comments. Anything not quoted below I just accepted. Thanks for the speedy turnaround. I've looked at v8, as far as a diff between the two patches and I'm happy. I've marked as ready for committer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> Thanks for the speedy turnaround. I've looked at v8, as far as a diff
> between the two patches and I'm happy.
> I've marked as ready for committer.
So ... last chance for objections?
I see from the cfbot that v8 is already broken (new call of lnext
to be fixed).  Don't really want to keep chasing a moving target,
so unless I hear objections I'm going to adjust the additional
spot(s) and commit this pretty soon, like tomorrow or Monday.
            regards, tom lane
			
		> On 13 Jul 2019, at 18:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I see from the cfbot that v8 is already broken (new call of lnext > to be fixed). Don't really want to keep chasing a moving target, > so unless I hear objections I'm going to adjust the additional > spot(s) and commit this pretty soon, like tomorrow or Monday. I just confirmed that fixing the recently introduced callsite not handled in the patch still passes tests etc. +1 on this. cheers ./daniel
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 13 Jul 2019, at 18:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I see from the cfbot that v8 is already broken (new call of lnext
>> to be fixed).  Don't really want to keep chasing a moving target,
>> so unless I hear objections I'm going to adjust the additional
>> spot(s) and commit this pretty soon, like tomorrow or Monday.
> I just confirmed that fixing the recently introduced callsite not handled in
> the patch still passes tests etc. +1 on this.
Thanks for checking!  I've now pushed this, with a bit of additional
cleanup and comment-improvement in pg_list.h and list.c.
            regards, tom lane
			
		I wrote:
> BTW, further on the subject of performance --- I'm aware of at least
> these topics for follow-on patches:
> ...
> * Adjust API for list_qsort(), as discussed, to save indirections and
> avoid constructing an intermediate pointer array.  I also seem to recall
> one place in the planner that's avoiding using list_qsort by manually
> flattening a list into an array, qsort'ing, and rebuilding the list :-(
Here's a proposed patch for that.  There are only two existing calls
of list_qsort(), it turns out.  I didn't find the described spot in
the planner (I believe I was thinking of choose_bitmap_and(), but its
usage isn't quite as described).  However, I found about four other
places that were doing pretty much exactly that, so the attached
also simplifies those places to use list_qsort().
(There are some other places that could perhaps be changed also,
but it would require more invasive hacking than I wanted to do here,
with less-clear benefits.)
A possibly controversial point is that I made list_qsort() sort the
given list in-place, rather than building a new list as it has
historically.  For every single one of the existing and new callers,
copying the input list is wasteful, because they were just leaking
the input list anyway.  But perhaps somebody feels that we should
preserve the "const input" property?  I thought that changing the
function to return void, as done here, might be a good idea to
ensure that callers notice its API changed --- otherwise they'd
only get a warning about incompatible signature of the passed
function pointer, which they might not notice; in fact I'm not
totally sure all compilers would even give such a warning.
If there's not complaints about that, I'm just going to go ahead
and push this --- it seems simple enough to not need much review.
            regards, tom lane
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index ecc2911..8a7eb6a 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1421,43 +1421,22 @@ list_copy_deep(const List *oldlist)
 /*
  * Sort a list as though by qsort.
  *
- * A new list is built and returned.  Like list_copy, this doesn't make
- * fresh copies of any pointed-to data.
+ * The list is sorted in-place (this is a change from pre-v13 Postgres).
  *
- * The comparator function receives arguments of type ListCell **.
- * (XXX that's really inefficient now, but changing it seems like
- * material for a standalone patch.)
+ * The comparator function is declared to receive arguments of type
+ * const ListCell *; this allows it to use lfirst() and variants
+ * without casting its arguments.
  */
-List *
-list_qsort(const List *list, list_qsort_comparator cmp)
+void
+list_qsort(List *list, list_qsort_comparator cmp)
 {
-    int            len = list_length(list);
-    ListCell  **list_arr;
-    List       *newlist;
-    ListCell   *cell;
-    int            i;
-
-    /* Empty list is easy */
-    if (len == 0)
-        return NIL;
-
-    /* We have to make an array of pointers to satisfy the API */
-    list_arr = (ListCell **) palloc(sizeof(ListCell *) * len);
-    i = 0;
-    foreach(cell, list)
-        list_arr[i++] = cell;
-
-    qsort(list_arr, len, sizeof(ListCell *), cmp);
-
-    /* Construct new list (this code is much like list_copy) */
-    newlist = new_list(list->type, len);
-
-    for (i = 0; i < len; i++)
-        newlist->elements[i] = *list_arr[i];
+    typedef int (*qsort_comparator) (const void *a, const void *b);
+    int            len;
-    /* Might as well free the workspace array */
-    pfree(list_arr);
+    check_list_invariants(list);
-    check_list_invariants(newlist);
-    return newlist;
+    /* Nothing to do if there's less than two elements */
+    len = list_length(list);
+    if (len > 1)
+        qsort(list->elements, len, sizeof(ListCell), (qsort_comparator) cmp);
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 67254c2..286d8de 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -52,8 +52,8 @@ typedef enum
 #define STD_FUZZ_FACTOR 1.01
 static List *translate_sub_tlist(List *tlist, int relid);
-static int    append_total_cost_compare(const void *a, const void *b);
-static int    append_startup_cost_compare(const void *a, const void *b);
+static int    append_total_cost_compare(const ListCell *a, const ListCell *b);
+static int    append_startup_cost_compare(const ListCell *a, const ListCell *b);
 static List *reparameterize_pathlist_by_child(PlannerInfo *root,
                                               List *pathlist,
                                               RelOptInfo *child_rel);
@@ -1239,9 +1239,8 @@ create_append_path(PlannerInfo *root,
          */
         Assert(pathkeys == NIL);
-        subpaths = list_qsort(subpaths, append_total_cost_compare);
-        partial_subpaths = list_qsort(partial_subpaths,
-                                      append_startup_cost_compare);
+        list_qsort(subpaths, append_total_cost_compare);
+        list_qsort(partial_subpaths, append_startup_cost_compare);
     }
     pathnode->first_partial_path = list_length(subpaths);
     pathnode->subpaths = list_concat(subpaths, partial_subpaths);
@@ -1303,10 +1302,10 @@ create_append_path(PlannerInfo *root,
  * (This is to avoid getting unpredictable results from qsort.)
  */
 static int
-append_total_cost_compare(const void *a, const void *b)
+append_total_cost_compare(const ListCell *a, const ListCell *b)
 {
-    Path       *path1 = (Path *) lfirst(*(ListCell **) a);
-    Path       *path2 = (Path *) lfirst(*(ListCell **) b);
+    Path       *path1 = (Path *) lfirst(a);
+    Path       *path2 = (Path *) lfirst(b);
     int            cmp;
     cmp = compare_path_costs(path1, path2, TOTAL_COST);
@@ -1324,10 +1323,10 @@ append_total_cost_compare(const void *a, const void *b)
  * (This is to avoid getting unpredictable results from qsort.)
  */
 static int
-append_startup_cost_compare(const void *a, const void *b)
+append_startup_cost_compare(const ListCell *a, const ListCell *b)
 {
-    Path       *path1 = (Path *) lfirst(*(ListCell **) a);
-    Path       *path2 = (Path *) lfirst(*(ListCell **) b);
+    Path       *path1 = (Path *) lfirst(a);
+    Path       *path2 = (Path *) lfirst(b);
     int            cmp;
     cmp = compare_path_costs(path1, path2, STARTUP_COST);
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 8dc3793..7f65b09 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1722,11 +1722,12 @@ expand_groupingset_node(GroupingSet *gs)
     return result;
 }
+/* list_qsort comparator to sort sub-lists by length */
 static int
-cmp_list_len_asc(const void *a, const void *b)
+cmp_list_len_asc(const ListCell *a, const ListCell *b)
 {
-    int            la = list_length(*(List *const *) a);
-    int            lb = list_length(*(List *const *) b);
+    int            la = list_length((const List *) lfirst(a));
+    int            lb = list_length((const List *) lfirst(b));
     return (la > lb) ? 1 : (la == lb) ? 0 : -1;
 }
@@ -1797,27 +1798,8 @@ expand_grouping_sets(List *groupingSets, int limit)
         result = new_result;
     }
-    if (list_length(result) > 1)
-    {
-        int            result_len = list_length(result);
-        List      **buf = palloc(sizeof(List *) * result_len);
-        List      **ptr = buf;
-
-        foreach(lc, result)
-        {
-            *ptr++ = lfirst(lc);
-        }
-
-        qsort(buf, result_len, sizeof(List *), cmp_list_len_asc);
-
-        result = NIL;
-        ptr = buf;
-
-        while (result_len-- > 0)
-            result = lappend(result, *ptr++);
-
-        pfree(buf);
-    }
+    /* Now sort the lists by length */
+    list_qsort(result, cmp_list_len_asc);
     return result;
 }
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d8c370e..2924ec0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -70,7 +70,7 @@ static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
-static int    compareWalFileNames(const void *a, const void *b);
+static int    compareWalFileNames(const ListCell *a, const ListCell *b);
 static void throttle(size_t increment);
 static bool is_checksummed_file(const char *fullpath, const char *filename);
@@ -379,13 +379,10 @@ perform_base_backup(basebackup_options *opt)
         struct stat statbuf;
         List       *historyFileList = NIL;
         List       *walFileList = NIL;
-        char      **walFiles;
-        int            nWalFiles;
         char        firstoff[MAXFNAMELEN];
         char        lastoff[MAXFNAMELEN];
         DIR           *dir;
         struct dirent *de;
-        int            i;
         ListCell   *lc;
         TimeLineID    tli;
@@ -428,24 +425,17 @@ perform_base_backup(basebackup_options *opt)
         CheckXLogRemoved(startsegno, ThisTimeLineID);
         /*
-         * Put the WAL filenames into an array, and sort. We send the files in
-         * order from oldest to newest, to reduce the chance that a file is
-         * recycled before we get a chance to send it over.
+         * Sort the WAL filenames.  We want to send the files in order from
+         * oldest to newest, to reduce the chance that a file is recycled
+         * before we get a chance to send it over.
          */
-        nWalFiles = list_length(walFileList);
-        walFiles = palloc(nWalFiles * sizeof(char *));
-        i = 0;
-        foreach(lc, walFileList)
-        {
-            walFiles[i++] = lfirst(lc);
-        }
-        qsort(walFiles, nWalFiles, sizeof(char *), compareWalFileNames);
+        list_qsort(walFileList, compareWalFileNames);
         /*
          * There must be at least one xlog file in the pg_wal directory, since
          * we are doing backup-including-xlog.
          */
-        if (nWalFiles < 1)
+        if (walFileList == NIL)
             ereport(ERROR,
                     (errmsg("could not find any WAL files")));
@@ -453,7 +443,8 @@ perform_base_backup(basebackup_options *opt)
          * Sanity check: the first and last segment should cover startptr and
          * endptr, with no gaps in between.
          */
-        XLogFromFileName(walFiles[0], &tli, &segno, wal_segment_size);
+        XLogFromFileName((char *) linitial(walFileList),
+                         &tli, &segno, wal_segment_size);
         if (segno != startsegno)
         {
             char        startfname[MAXFNAMELEN];
@@ -463,12 +454,13 @@ perform_base_backup(basebackup_options *opt)
             ereport(ERROR,
                     (errmsg("could not find WAL file \"%s\"", startfname)));
         }
-        for (i = 0; i < nWalFiles; i++)
+        foreach(lc, walFileList)
         {
+            char       *walFileName = (char *) lfirst(lc);
             XLogSegNo    currsegno = segno;
             XLogSegNo    nextsegno = segno + 1;
-            XLogFromFileName(walFiles[i], &tli, &segno, wal_segment_size);
+            XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
             if (!(nextsegno == segno || currsegno == segno))
             {
                 char        nextfname[MAXFNAMELEN];
@@ -489,15 +481,16 @@ perform_base_backup(basebackup_options *opt)
         }
         /* Ok, we have everything we need. Send the WAL files. */
-        for (i = 0; i < nWalFiles; i++)
+        foreach(lc, walFileList)
         {
+            char       *walFileName = (char *) lfirst(lc);
             FILE       *fp;
             char        buf[TAR_SEND_SIZE];
             size_t        cnt;
             pgoff_t        len = 0;
-            snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFiles[i]);
-            XLogFromFileName(walFiles[i], &tli, &segno, wal_segment_size);
+            snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+            XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
             fp = AllocateFile(pathbuf, "rb");
             if (fp == NULL)
@@ -527,7 +520,7 @@ perform_base_backup(basebackup_options *opt)
                 CheckXLogRemoved(segno, tli);
                 ereport(ERROR,
                         (errcode_for_file_access(),
-                         errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
+                         errmsg("unexpected WAL file size \"%s\"", walFileName)));
             }
             /* send the WAL file itself */
@@ -555,7 +548,7 @@ perform_base_backup(basebackup_options *opt)
                 CheckXLogRemoved(segno, tli);
                 ereport(ERROR,
                         (errcode_for_file_access(),
-                         errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
+                         errmsg("unexpected WAL file size \"%s\"", walFileName)));
             }
             /* wal_segment_size is a multiple of 512, so no need for padding */
@@ -568,7 +561,7 @@ perform_base_backup(basebackup_options *opt)
              * walreceiver.c always doing an XLogArchiveForceDone() after a
              * complete segment.
              */
-            StatusFilePath(pathbuf, walFiles[i], ".done");
+            StatusFilePath(pathbuf, walFileName, ".done");
             sendFileWithContent(pathbuf, "");
         }
@@ -618,14 +611,14 @@ perform_base_backup(basebackup_options *opt)
 }
 /*
- * qsort comparison function, to compare log/seg portion of WAL segment
+ * list_qsort comparison function, to compare log/seg portion of WAL segment
  * filenames, ignoring the timeline portion.
  */
 static int
-compareWalFileNames(const void *a, const void *b)
+compareWalFileNames(const ListCell *a, const ListCell *b)
 {
-    char       *fna = *((char **) a);
-    char       *fnb = *((char **) b);
+    char       *fna = (char *) lfirst(a);
+    char       *fnb = (char *) lfirst(b);
     return strcmp(fna + 8, fnb + 8);
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 591377d..d7ee8eb 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3378,13 +3378,13 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
 }
 /*
- * qsort() comparator for sorting RewriteMappingFiles in LSN order.
+ * list_qsort() comparator for sorting RewriteMappingFiles in LSN order.
  */
 static int
-file_sort_by_lsn(const void *a_p, const void *b_p)
+file_sort_by_lsn(const ListCell *a_p, const ListCell *b_p)
 {
-    RewriteMappingFile *a = *(RewriteMappingFile **) a_p;
-    RewriteMappingFile *b = *(RewriteMappingFile **) b_p;
+    RewriteMappingFile *a = (RewriteMappingFile *) lfirst(a_p);
+    RewriteMappingFile *b = (RewriteMappingFile *) lfirst(b_p);
     if (a->lsn < b->lsn)
         return -1;
@@ -3404,8 +3404,6 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
     struct dirent *mapping_de;
     List       *files = NIL;
     ListCell   *file;
-    RewriteMappingFile **files_a;
-    size_t        off;
     Oid            dboid = IsSharedRelation(relid) ? InvalidOid : MyDatabaseId;
     mapping_dir = AllocateDir("pg_logical/mappings");
@@ -3459,21 +3457,12 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
     }
     FreeDir(mapping_dir);
-    /* build array we can easily sort */
-    files_a = palloc(list_length(files) * sizeof(RewriteMappingFile *));
-    off = 0;
-    foreach(file, files)
-    {
-        files_a[off++] = lfirst(file);
-    }
-
     /* sort files so we apply them in LSN order */
-    qsort(files_a, list_length(files), sizeof(RewriteMappingFile *),
-          file_sort_by_lsn);
+    list_qsort(files, file_sort_by_lsn);
-    for (off = 0; off < list_length(files); off++)
+    foreach(file, files)
     {
-        RewriteMappingFile *f = files_a[off];
+        RewriteMappingFile *f = (RewriteMappingFile *) lfirst(file);
         elog(DEBUG1, "applying mapping: \"%s\" in %u", f->fname,
              snapshot->subxip[0]);
diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
index 300af6f..2815300 100644
--- a/src/backend/rewrite/rowsecurity.c
+++ b/src/backend/rewrite/rowsecurity.c
@@ -63,9 +63,9 @@ static void get_policies_for_relation(Relation relation,
                                       List **permissive_policies,
                                       List **restrictive_policies);
-static List *sort_policies_by_name(List *policies);
+static void sort_policies_by_name(List *policies);
-static int    row_security_policy_cmp(const void *a, const void *b);
+static int    row_security_policy_cmp(const ListCell *a, const ListCell *b);
 static void add_security_quals(int rt_index,
                                List *permissive_policies,
@@ -470,7 +470,7 @@ get_policies_for_relation(Relation relation, CmdType cmd, Oid user_id,
      * We sort restrictive policies by name so that any WCOs they generate are
      * checked in a well-defined order.
      */
-    *restrictive_policies = sort_policies_by_name(*restrictive_policies);
+    sort_policies_by_name(*restrictive_policies);
     /*
      * Then add any permissive or restrictive policies defined by extensions.
@@ -488,7 +488,7 @@ get_policies_for_relation(Relation relation, CmdType cmd, Oid user_id,
          * always check all built-in restrictive policies, in name order,
          * before checking restrictive policies added by hooks, in name order.
          */
-        hook_policies = sort_policies_by_name(hook_policies);
+        sort_policies_by_name(hook_policies);
         foreach(item, hook_policies)
         {
@@ -522,43 +522,20 @@ get_policies_for_relation(Relation relation, CmdType cmd, Oid user_id,
  * This is not necessary for permissive policies, since they are all combined
  * together using OR into a single WithCheckOption check.
  */
-static List *
+static void
 sort_policies_by_name(List *policies)
 {
-    int            npol = list_length(policies);
-    RowSecurityPolicy *pols;
-    ListCell   *item;
-    int            ii = 0;
-
-    if (npol <= 1)
-        return policies;
-
-    pols = (RowSecurityPolicy *) palloc(sizeof(RowSecurityPolicy) * npol);
-
-    foreach(item, policies)
-    {
-        RowSecurityPolicy *policy = (RowSecurityPolicy *) lfirst(item);
-
-        pols[ii++] = *policy;
-    }
-
-    qsort(pols, npol, sizeof(RowSecurityPolicy), row_security_policy_cmp);
-
-    policies = NIL;
-    for (ii = 0; ii < npol; ii++)
-        policies = lappend(policies, &pols[ii]);
-
-    return policies;
+    list_qsort(policies, row_security_policy_cmp);
 }
 /*
  * qsort comparator to sort RowSecurityPolicy entries by name
  */
 static int
-row_security_policy_cmp(const void *a, const void *b)
+row_security_policy_cmp(const ListCell *a, const ListCell *b)
 {
-    const RowSecurityPolicy *pa = (const RowSecurityPolicy *) a;
-    const RowSecurityPolicy *pb = (const RowSecurityPolicy *) b;
+    const RowSecurityPolicy *pa = (const RowSecurityPolicy *) lfirst(a);
+    const RowSecurityPolicy *pb = (const RowSecurityPolicy *) lfirst(b);
     /* Guard against NULL policy names from extensions */
     if (pa->policy_name == NULL)
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 8b1e4fb..8ed0a64 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -570,7 +570,7 @@ extern List *list_copy(const List *list);
 extern List *list_copy_tail(const List *list, int nskip);
 extern List *list_copy_deep(const List *oldlist);
-typedef int (*list_qsort_comparator) (const void *a, const void *b);
-extern List *list_qsort(const List *list, list_qsort_comparator cmp);
+typedef int (*list_qsort_comparator) (const ListCell *a, const ListCell *b);
+extern void list_qsort(List *list, list_qsort_comparator cmp);
 #endif                            /* PG_LIST_H */
			
		I wrote:
> [ list_qsort-API-change.patch ]
Also, here's a follow-on patch that cleans up some crufty code in
heap.c and relcache.c to use list_qsort, rather than manually doing
insertions into a list that's kept ordered.  The existing comments
argue that that's faster than qsort for small N, but I think that's
a bit questionable.  And anyway, that code is definitely O(N^2) if
N isn't so small, while this replacement logic is O(N log N).
This incidentally removes the only two calls of lappend_cell_oid.
There are no callers of lappend_cell_int, and only one of lappend_cell,
and that one would be noticeably cleaner if it were rewritten to use
list_insert_nth instead.  So I'm a bit tempted to do so and then nuke
all three of those functions, which would at least make some tiny dent
in Andres' unhappiness with the messiness of the List API.  Thoughts?
            regards, tom lane
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ab8a475..43d4252 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -125,7 +125,6 @@ static void SetRelationNumChecks(Relation rel, int numchecks);
 static Node *cookConstraint(ParseState *pstate,
                             Node *raw_constraint,
                             char *relname);
-static List *insert_ordered_unique_oid(List *list, Oid datum);
 /* ----------------------------------------------------------------
@@ -3384,55 +3383,19 @@ heap_truncate_find_FKs(List *relationIds)
         if (!list_member_oid(relationIds, con->confrelid))
             continue;
-        /* Add referencer unless already in input or result list */
+        /* Add referencer to result, unless present in input list */
         if (!list_member_oid(relationIds, con->conrelid))
-            result = insert_ordered_unique_oid(result, con->conrelid);
+            result = lappend_oid(result, con->conrelid);
     }
     systable_endscan(fkeyScan);
     table_close(fkeyRel, AccessShareLock);
-    return result;
-}
+    /* Now sort and de-duplicate the result list */
+    list_qsort(result, list_oid_cmp);
+    list_deduplicate_oid(result);
-/*
- * insert_ordered_unique_oid
- *        Insert a new Oid into a sorted list of Oids, preserving ordering,
- *        and eliminating duplicates
- *
- * Building the ordered list this way is O(N^2), but with a pretty small
- * constant, so for the number of entries we expect it will probably be
- * faster than trying to apply qsort().  It seems unlikely someone would be
- * trying to truncate a table with thousands of dependent tables ...
- */
-static List *
-insert_ordered_unique_oid(List *list, Oid datum)
-{
-    ListCell   *prev;
-
-    /* Does the datum belong at the front? */
-    if (list == NIL || datum < linitial_oid(list))
-        return lcons_oid(datum, list);
-    /* Does it match the first entry? */
-    if (datum == linitial_oid(list))
-        return list;            /* duplicate, so don't insert */
-    /* No, so find the entry it belongs after */
-    prev = list_head(list);
-    for (;;)
-    {
-        ListCell   *curr = lnext(list, prev);
-
-        if (curr == NULL || datum < lfirst_oid(curr))
-            break;                /* it belongs after 'prev', before 'curr' */
-
-        if (datum == lfirst_oid(curr))
-            return list;        /* duplicate, so don't insert */
-
-        prev = curr;
-    }
-    /* Insert datum into list after 'prev' */
-    lappend_cell_oid(list, prev, datum);
-    return list;
+    return result;
 }
 /*
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 8a7eb6a..c16b220 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1298,6 +1298,34 @@ list_concat_unique_oid(List *list1, const List *list2)
 }
 /*
+ * Remove adjacent duplicates in a list of OIDs.
+ *
+ * It is caller's responsibility to have sorted the list to bring duplicates
+ * together, perhaps via list_qsort(list, list_oid_cmp).
+ */
+void
+list_deduplicate_oid(List *list)
+{
+    int            len;
+
+    Assert(IsOidList(list));
+    len = list_length(list);
+    if (len > 1)
+    {
+        ListCell   *elements = list->elements;
+        int            i = 0;
+
+        for (int j = 1; j < len; j++)
+        {
+            if (elements[i].oid_value != elements[j].oid_value)
+                elements[++i].oid_value = elements[j].oid_value;
+        }
+        list->length = i + 1;
+    }
+    check_list_invariants(list);
+}
+
+/*
  * Free all storage in a list, and optionally the pointed-to elements
  */
 static void
@@ -1440,3 +1468,19 @@ list_qsort(List *list, list_qsort_comparator cmp)
     if (len > 1)
         qsort(list->elements, len, sizeof(ListCell), (qsort_comparator) cmp);
 }
+
+/*
+ * list_qsort comparator for sorting a list into ascending OID order.
+ */
+int
+list_oid_cmp(const ListCell *p1, const ListCell *p2)
+{
+    Oid            v1 = lfirst_oid(p1);
+    Oid            v2 = lfirst_oid(p2);
+
+    if (v1 < v2)
+        return -1;
+    if (v1 > v2)
+        return 1;
+    return 0;
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3ca9a9f..13ce2b6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -285,7 +285,6 @@ static TupleDesc GetPgIndexDescriptor(void);
 static void AttrDefaultFetch(Relation relation);
 static void CheckConstraintFetch(Relation relation);
 static int    CheckConstraintCmp(const void *a, const void *b);
-static List *insert_ordered_oid(List *list, Oid datum);
 static void InitIndexAmRoutine(Relation relation);
 static void IndexSupportInitialize(oidvector *indclass,
                                    RegProcedure *indexSupport,
@@ -4387,8 +4386,8 @@ RelationGetIndexList(Relation relation)
         if (!index->indislive)
             continue;
-        /* Add index's OID to result list in the proper order */
-        result = insert_ordered_oid(result, index->indexrelid);
+        /* add index's OID to result list */
+        result = lappend_oid(result, index->indexrelid);
         /*
          * Invalid, non-unique, non-immediate or predicate indexes aren't
@@ -4413,6 +4412,9 @@ RelationGetIndexList(Relation relation)
     table_close(indrel, AccessShareLock);
+    /* Sort the result list into OID order, per API spec. */
+    list_qsort(result, list_oid_cmp);
+
     /* Now save a copy of the completed list in the relcache entry. */
     oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
     oldlist = relation->rd_indexlist;
@@ -4494,13 +4496,16 @@ RelationGetStatExtList(Relation relation)
     {
         Oid            oid = ((Form_pg_statistic_ext) GETSTRUCT(htup))->oid;
-        result = insert_ordered_oid(result, oid);
+        result = lappend_oid(result, oid);
     }
     systable_endscan(indscan);
     table_close(indrel, AccessShareLock);
+    /* Sort the result list into OID order, per API spec. */
+    list_qsort(result, list_oid_cmp);
+
     /* Now save a copy of the completed list in the relcache entry. */
     oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
     oldlist = relation->rd_statlist;
@@ -4516,39 +4521,6 @@ RelationGetStatExtList(Relation relation)
 }
 /*
- * insert_ordered_oid
- *        Insert a new Oid into a sorted list of Oids, preserving ordering
- *
- * Building the ordered list this way is O(N^2), but with a pretty small
- * constant, so for the number of entries we expect it will probably be
- * faster than trying to apply qsort().  Most tables don't have very many
- * indexes...
- */
-static List *
-insert_ordered_oid(List *list, Oid datum)
-{
-    ListCell   *prev;
-
-    /* Does the datum belong at the front? */
-    if (list == NIL || datum < linitial_oid(list))
-        return lcons_oid(datum, list);
-    /* No, so find the entry it belongs after */
-    prev = list_head(list);
-    for (;;)
-    {
-        ListCell   *curr = lnext(list, prev);
-
-        if (curr == NULL || datum < lfirst_oid(curr))
-            break;                /* it belongs after 'prev', before 'curr' */
-
-        prev = curr;
-    }
-    /* Insert datum into list after 'prev' */
-    lappend_cell_oid(list, prev, datum);
-    return list;
-}
-
-/*
  * RelationGetPrimaryKeyIndex -- get OID of the relation's primary key index
  *
  * Returns InvalidOid if there is no such index.
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 8ed0a64..33f39ba 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -563,6 +563,8 @@ extern List *list_concat_unique_ptr(List *list1, const List *list2);
 extern List *list_concat_unique_int(List *list1, const List *list2);
 extern List *list_concat_unique_oid(List *list1, const List *list2);
+extern void list_deduplicate_oid(List *list);
+
 extern void list_free(List *list);
 extern void list_free_deep(List *list);
@@ -573,4 +575,6 @@ extern List *list_copy_deep(const List *oldlist);
 typedef int (*list_qsort_comparator) (const ListCell *a, const ListCell *b);
 extern void list_qsort(List *list, list_qsort_comparator cmp);
+extern int    list_oid_cmp(const ListCell *p1, const ListCell *p2);
+
 #endif                            /* PG_LIST_H */
			
		On Tue, 16 Jul 2019 at 07:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A possibly controversial point is that I made list_qsort() sort the > given list in-place, rather than building a new list as it has > historically. For every single one of the existing and new callers, > copying the input list is wasteful, because they were just leaking > the input list anyway. But perhaps somebody feels that we should > preserve the "const input" property? I thought that changing the > function to return void, as done here, might be a good idea to > ensure that callers notice its API changed --- otherwise they'd > only get a warning about incompatible signature of the passed > function pointer, which they might not notice; in fact I'm not > totally sure all compilers would even give such a warning. > > If there's not complaints about that, I'm just going to go ahead > and push this --- it seems simple enough to not need much review. The only thoughts I have so far here are that it's a shame that the function got called list_qsort() and not just list_sort(). I don't see why callers need to know anything about the sort algorithm that's being used. If we're going to break compatibility for this, should we rename the function too? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> The only thoughts I have so far here are that it's a shame that the
> function got called list_qsort() and not just list_sort().  I don't
> see why callers need to know anything about the sort algorithm that's
> being used.
Meh.  list_qsort() is quicksort only to the extent that qsort()
is quicksort, which in our current implementation is a bit of a
lie already --- and, I believe, it's much more of a lie in some
versions of libc.  I don't really think of either name as promising
anything about the underlying sort algorithm.  What they do share
is an API based on a callback comparison function, and if you are
looking for uses of those, it's a lot easier to grep for "qsort"
than some more-generic term.
            regards, tom lane
			
		On 7/15/19 11:07 PM, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> The only thoughts I have so far here are that it's a shame that the >> function got called list_qsort() and not just list_sort(). I don't >> see why callers need to know anything about the sort algorithm that's >> being used. > > Meh. list_qsort() is quicksort only to the extent that qsort() > is quicksort, which in our current implementation is a bit of a > lie already --- and, I believe, it's much more of a lie in some > versions of libc. I don't really think of either name as promising > anything about the underlying sort algorithm. What they do share > is an API based on a callback comparison function, and if you are > looking for uses of those, it's a lot easier to grep for "qsort" > than some more-generic term. I agree with David -- list_sort() is better. I don't think "sort" is such a common stem that searching is a big issue, especially with modern code indexing tools. -- -David david@pgmasters.net
David Steele <david@pgmasters.net> writes:
> On 7/15/19 11:07 PM, Tom Lane wrote:
>> David Rowley <david.rowley@2ndquadrant.com> writes:
>>> The only thoughts I have so far here are that it's a shame that the
>>> function got called list_qsort() and not just list_sort().
> I agree with David -- list_sort() is better.  I don't think "sort" is 
> such a common stem that searching is a big issue, especially with modern 
> code indexing tools.
OK, I'm outvoted, will do it that way.
            regards, tom lane
			
		On Tue, Jul 16, 2019 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Steele <david@pgmasters.net> writes: > > On 7/15/19 11:07 PM, Tom Lane wrote: > >> David Rowley <david.rowley@2ndquadrant.com> writes: > >>> The only thoughts I have so far here are that it's a shame that the > >>> function got called list_qsort() and not just list_sort(). > > > I agree with David -- list_sort() is better. I don't think "sort" is > > such a common stem that searching is a big issue, especially with modern > > code indexing tools. > > OK, I'm outvoted, will do it that way. I cast my vote in the other direction i.e. for sticking with qsort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jul 16, 2019 at 9:01 AM Robert Haas <robertmhaas@gmail.com> wrote: > I cast my vote in the other direction i.e. for sticking with qsort. I do too. -- Peter Geoghegan
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 16, 2019 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> OK, I'm outvoted, will do it that way.
> I cast my vote in the other direction i.e. for sticking with qsort.
Didn't see this until after pushing a commit that uses "list_sort".
While composing that commit message another argument occurred to me,
which is that renaming makes it absolutely sure that any external
callers will notice they have an API change to deal with, no matter
how forgiving their compiler is.  Also, if somebody really really
doesn't want to cope with the change, they can now make their own
version of list_qsort (stealing it out of 1cff1b95a) and the core
code won't pose a conflict.
So I'm good with "list_sort" at this point.
            regards, tom lane
			
		I wrote:
> * Rationalize places that are using combinations of list_copy and
> list_concat, probably by inventing an additional list-concatenation
> primitive that modifies neither input.
I poked around to see what we have in this department.  There seem to
be several identifiable use-cases:
* Concat two Lists that are freshly built, or at least not otherwise
referenced.  In the old code, list_concat serves fine, leaking the
second List's header but not any of its cons cells.  As of 1cff1b95a,
the second List's storage is leaked completely.  We could imagine
inventing a list_concat variant that list_free's its second input,
but I'm unconvinced that that's worth the trouble.  Few if any
callers can't stand to leak any storage, and if there are any where
it seems worth the trouble, adding an explicit list_free seems about
as good as calling a variant of list_concat.  (If we do want such a
variant, we need a name for it.  list_join, maybe, by analogy to
bms_join?)
* Concat two lists where there exist other pointers to the second list,
but it's okay if the lists share cons cells afterwards.  As of the
new code, they don't actually share any storage, which seems strictly
better.  I don't think we need to do anything here, except go around
and adjust the comments that explain that that's what's happening.
* Concat two lists where there exist other pointers to the second list,
and it's not okay to share storage.  This is currently implemented by
list_copy'ing the second argument, but we can just drop that (and
adjust comments where needed).
* Concat two lists where we mustn't modify either input list.
This is currently implemented by list_copy'ing both arguments.
I'm inclined to replace this pattern with a function like
"list_concat_copy(const List *, const List *)", although settling
on a suitable name might be difficult.
* There's a small number of places that list_copy the first argument
but not the second.  I believe that all of these are either of the form
"x = list_concat(list_copy(y), x)", ie replacing the only reference to
the second argument, or are relying on the "it's okay to share storage"
assumption to not copy a second argument that has other references.
I think we can just replace these with list_concat_copy.  We'll leak
the second argument's storage in the cases where another list is being
prepended onto a working list, but I doubt it's worth fussing over.
(But, if this is repeated a lot of times, maybe it is worth fussing
over?  Conceivably you could leak O(N^2) storage while building a
long working list, if you prepend many shorter lists onto it.)
* Note that some places are applying copyObject() not list_copy().
In these places the idea is to make duplicates of pointed-to structures
not just the list proper.  These should be left alone, I think.
When the copyObject is applied to the second argument, we're leaking
the top-level List in the copy result, but again it's not worth
fussing over.
Comments?
            regards, tom lane
			
		I wrote:
> * Look at places using lcons/list_delete_first to maintain FIFO lists.
> The patch makes these O(N^2) for long lists.  If we can reverse the list
> order and use lappend/list_truncate instead, it'd be better.  Possibly in
> some places the list ordering is critical enough to make this impractical,
> but I suspect it's an easy win in most.
Attached are two patches that touch all the places where it seemed like
an easy win to stop using lcons and/or list_delete_first.
0001 adds list_delete_last() as a mirror image to list_delete_first(),
and changes all the places where it seemed 100% safe to do so (ie,
there's no behavioral change because the list order is demonstrably
immaterial).
0002 changes some additional places where it's maybe a bit less safe,
ie there's a potential for user-visible behavioral change because
processing will occur in a different order.  In particular, the proposed
change in execExpr.c causes aggregates and window functions that are in
the same plan node to be executed in a different order than before ---
but it seems to me that this order is saner.  (Note the change in the
expected regression results, in a test that's intentionally sensitive to
the execution order.)  And anyway when did we guarantee anything about
that?
I refrained from changing lcons to lappend in get_relation_info, because
that demonstrably causes the planner to change its choices when two
indexes look equally attractive, and probably people would complain
about that.  I think that the other changes proposed in 0002 are pretty
harmless --- for example, in get_tables_to_cluster the order depends
initially on the results of a seqscan of pg_index, so anybody who's
expecting stability is in for rude surprises anyhow.  Also, the proposed
changes in plancat.c, parse_agg.c, selfuncs.c almost certainly have no
user-visible effect, but maybe there could be changes at the
roundoff-error level due to processing estimates in a different order?
There are a bunch of places that are using list_delete_first to remove
the next-to-process entry from a List used as a queue.  In principle,
we could invert the order of those queues and then use list_delete_last,
but I thought this would probably be too confusing: it's natural to
think of the front of the list as being the head of the queue.  I doubt
that any of those queues get long enough for it to be a serious
performance problem to leave them as-is.
(Actually, I doubt that any of these changes will really move the
performance needle in the real world.  It's more a case of wanting
the code to present good examples not bad ones.)
Thoughts?  Anybody want to object to any of the changes in 0002?
            regards, tom lane
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index dfb51f6..169bf6f 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1323,8 +1323,6 @@ static void
 gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
                 GISTSTATE *giststate, List *splitinfo, bool unlockbuf)
 {
-    ListCell   *lc;
-    List       *reversed;
     GISTPageSplitInfo *right;
     GISTPageSplitInfo *left;
     IndexTuple    tuples[2];
@@ -1339,14 +1337,6 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
      * left. Finally insert the downlink for the last new page and update the
      * downlink for the original page as one operation.
      */
-
-    /* for convenience, create a copy of the list in reverse order */
-    reversed = NIL;
-    foreach(lc, splitinfo)
-    {
-        reversed = lcons(lfirst(lc), reversed);
-    }
-
     LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE);
     gistFindCorrectParent(state->r, stack);
@@ -1354,10 +1344,10 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
      * insert downlinks for the siblings from right to left, until there are
      * only two siblings left.
      */
-    while (list_length(reversed) > 2)
+    for (int pos = list_length(splitinfo) - 1; pos > 1; pos--)
     {
-        right = (GISTPageSplitInfo *) linitial(reversed);
-        left = (GISTPageSplitInfo *) lsecond(reversed);
+        right = (GISTPageSplitInfo *) list_nth(splitinfo, pos);
+        left = (GISTPageSplitInfo *) list_nth(splitinfo, pos - 1);
         if (gistinserttuples(state, stack->parent, giststate,
                              &right->downlink, 1,
@@ -1371,11 +1361,10 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
             gistFindCorrectParent(state->r, stack);
         }
         /* gistinserttuples() released the lock on right->buf. */
-        reversed = list_delete_first(reversed);
     }
-    right = (GISTPageSplitInfo *) linitial(reversed);
-    left = (GISTPageSplitInfo *) lsecond(reversed);
+    right = (GISTPageSplitInfo *) lsecond(splitinfo);
+    left = (GISTPageSplitInfo *) linitial(splitinfo);
     /*
      * Finally insert downlink for the remaining right page and update the
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6c3ff76..032fab9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -633,7 +633,7 @@ CheckAttributeType(const char *attname,
                      errmsg("composite type %s cannot be made a member of itself",
                             format_type_be(atttypid))));
-        containing_rowtypes = lcons_oid(atttypid, containing_rowtypes);
+        containing_rowtypes = lappend_oid(containing_rowtypes, atttypid);
         relation = relation_open(get_typ_typrelid(atttypid), AccessShareLock);
@@ -653,7 +653,7 @@ CheckAttributeType(const char *attname,
         relation_close(relation, AccessShareLock);
-        containing_rowtypes = list_delete_first(containing_rowtypes);
+        containing_rowtypes = list_delete_last(containing_rowtypes);
     }
     else if (OidIsValid((att_typelem = get_element_type(atttypid))))
     {
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 417d595..bae3b38 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -281,11 +281,11 @@ LockViewRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, List *ancestor_views
     context.nowait = nowait;
     context.viewowner = view->rd_rel->relowner;
     context.viewoid = reloid;
-    context.ancestor_views = lcons_oid(reloid, ancestor_views);
+    context.ancestor_views = lappend_oid(ancestor_views, reloid);
     LockViewRecurse_walker((Node *) viewquery, &context);
-    ancestor_views = list_delete_oid(ancestor_views, reloid);
+    (void) list_delete_last(context.ancestor_views);
     table_close(view, NoLock);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0c0ddd5..fc1c4df 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14480,6 +14480,11 @@ register_on_commit_action(Oid relid, OnCommitAction action)
     oc->creating_subid = GetCurrentSubTransactionId();
     oc->deleting_subid = InvalidSubTransactionId;
+    /*
+     * We use lcons() here so that ON COMMIT actions are processed in reverse
+     * order of registration.  That might not be essential but it seems
+     * reasonable.
+     */
     on_commits = lcons(oc, on_commits);
     MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 5584fa8..9163464 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -827,6 +827,30 @@ list_delete_first(List *list)
 }
 /*
+ * Delete the last element of the list.
+ *
+ * This is the opposite of list_delete_first(), but is noticeably cheaper
+ * with a long list, since no data need be moved.
+ */
+List *
+list_delete_last(List *list)
+{
+    check_list_invariants(list);
+
+    if (list == NIL)
+        return NIL;                /* would an error be better? */
+
+    /* list_truncate won't free list if it goes to empty, but this should */
+    if (list_length(list) <= 1)
+    {
+        list_free(list);
+        return NIL;
+    }
+
+    return list_truncate(list, list_length(list) - 1);
+}
+
+/*
  * Generate the union of two lists. This is calculated by copying
  * list1 via list_copy(), then adding to it all the members of list2
  * that aren't already in list1.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index f0e789f..99dbf8d 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -881,11 +881,9 @@ is_parallel_safe(PlannerInfo *root, Node *node)
         foreach(l, proot->init_plans)
         {
             SubPlan    *initsubplan = (SubPlan *) lfirst(l);
-            ListCell   *l2;
-            foreach(l2, initsubplan->setParam)
-                context.safe_param_ids = lcons_int(lfirst_int(l2),
-                                                   context.safe_param_ids);
+            context.safe_param_ids = list_concat(context.safe_param_ids,
+                                                 initsubplan->setParam);
         }
     }
@@ -1015,6 +1013,7 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
                                               context->safe_param_ids);
         if (max_parallel_hazard_walker(subplan->testexpr, context))
             return true;        /* no need to restore safe_param_ids */
+        list_free(context->safe_param_ids);
         context->safe_param_ids = save_safe_param_ids;
         /* we must also check args, but no special Param treatment there */
         if (max_parallel_hazard_walker((Node *) subplan->args, context))
@@ -4185,8 +4184,8 @@ add_function_defaults(List *args, HeapTuple func_tuple)
     ndelete = nargsprovided + list_length(defaults) - funcform->pronargs;
     if (ndelete < 0)
         elog(ERROR, "not enough default arguments");
-    while (ndelete-- > 0)
-        defaults = list_delete_first(defaults);
+    if (ndelete > 0)
+        defaults = list_copy_tail(defaults, ndelete);
     /* And form the combined argument list, not modifying the input list */
     return list_concat(list_copy(args), defaults);
@@ -4701,9 +4700,9 @@ inline_function(Oid funcid, Oid result_type, Oid result_collid,
      * Recursively try to simplify the modified expression.  Here we must add
      * the current function to the context list of active functions.
      */
-    context->active_fns = lcons_oid(funcid, context->active_fns);
+    context->active_fns = lappend_oid(context->active_fns, funcid);
     newexpr = eval_const_expressions_mutator(newexpr, context);
-    context->active_fns = list_delete_first(context->active_fns);
+    context->active_fns = list_delete_last(context->active_fns);
     error_context_stack = sqlerrcontext.previous;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 5b047d1..93b6784 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1973,7 +1973,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                             (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                              errmsg("infinite recursion detected in rules for relation \"%s\"",
                                     RelationGetRelationName(rel))));
-                activeRIRs = lcons_oid(RelationGetRelid(rel), activeRIRs);
+                activeRIRs = lappend_oid(activeRIRs, RelationGetRelid(rel));
                 foreach(l, locks)
                 {
@@ -1986,7 +1986,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                                                   activeRIRs);
                 }
-                activeRIRs = list_delete_first(activeRIRs);
+                activeRIRs = list_delete_last(activeRIRs);
             }
         }
@@ -2059,7 +2059,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                              errmsg("infinite recursion detected in policy for relation \"%s\"",
                                     RelationGetRelationName(rel))));
-                activeRIRs = lcons_oid(RelationGetRelid(rel), activeRIRs);
+                activeRIRs = lappend_oid(activeRIRs, RelationGetRelid(rel));
                 /*
                  * get_row_security_policies just passed back securityQuals
@@ -2084,7 +2084,7 @@ fireRIRrules(Query *parsetree, List *activeRIRs)
                 expression_tree_walker((Node *) withCheckOptions,
                                        fireRIRonSubLink, (void *) activeRIRs);
-                activeRIRs = list_delete_first(activeRIRs);
+                activeRIRs = list_delete_last(activeRIRs);
             }
             /*
@@ -3711,7 +3711,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
             rev = (rewrite_event *) palloc(sizeof(rewrite_event));
             rev->relation = RelationGetRelid(rt_entry_relation);
             rev->event = event;
-            rewrite_events = lcons(rev, rewrite_events);
+            rewrite_events = lappend(rewrite_events, rev);
             foreach(n, product_queries)
             {
@@ -3722,7 +3722,7 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
                 rewritten = list_concat(rewritten, newstuff);
             }
-            rewrite_events = list_delete_first(rewrite_events);
+            rewrite_events = list_delete_last(rewrite_events);
         }
         /*
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 71dc4dc..1463408 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -531,6 +531,7 @@ extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
 extern List *list_delete_oid(List *list, Oid datum);
 extern List *list_delete_first(List *list);
+extern List *list_delete_last(List *list);
 extern List *list_delete_nth_cell(List *list, int n);
 extern List *list_delete_cell(List *list, ListCell *cell);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ebaec4f..cedb4ee 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1566,7 +1566,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
         rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
         rvtc->tableOid = index->indrelid;
         rvtc->indexOid = index->indexrelid;
-        rvs = lcons(rvtc, rvs);
+        rvs = lappend(rvs, rvtc);
         MemoryContextSwitchTo(old_context);
     }
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index e9c8873..89887b8 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2999,7 +2999,7 @@ get_rels_with_domain(Oid domainOid, LOCKMODE lockmode)
             rtc->rel = rel;
             rtc->natts = 0;
             rtc->atts = (int *) palloc(sizeof(int) * RelationGetNumberOfAttributes(rel));
-            result = lcons(rtc, result);
+            result = lappend(result, rtc);
         }
         /*
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e4e0575..6d09f2a 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -786,7 +786,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
                 {
                     AggState   *aggstate = (AggState *) state->parent;
-                    aggstate->aggs = lcons(astate, aggstate->aggs);
+                    aggstate->aggs = lappend(aggstate->aggs, astate);
                     aggstate->numaggs++;
                 }
                 else
@@ -834,7 +834,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
                     WindowAggState *winstate = (WindowAggState *) state->parent;
                     int            nfuncs;
-                    winstate->funcs = lcons(wfstate, winstate->funcs);
+                    winstate->funcs = lappend(winstate->funcs, wfstate);
                     nfuncs = ++winstate->numfuncs;
                     if (wfunc->winagg)
                         winstate->numaggs++;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6ea625a..98e9948 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -419,6 +419,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
             index_close(indexRelation, NoLock);
+            /*
+             * We've historically used lcons() here.  It'd make more sense to
+             * use lappend(), but that causes the planner to change behavior
+             * in cases where two indexes seem equally attractive.  For now,
+             * stick with lcons() --- few tables should have so many indexes
+             * that the O(N^2) behavior of lcons() is really a problem.
+             */
             indexinfos = lcons(info, indexinfos);
         }
@@ -1339,7 +1346,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_NDISTINCT;
             info->keys = bms_copy(keys);
-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }
         if (statext_is_kind_built(dtup, STATS_EXT_DEPENDENCIES))
@@ -1351,7 +1358,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_DEPENDENCIES;
             info->keys = bms_copy(keys);
-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }
         if (statext_is_kind_built(dtup, STATS_EXT_MCV))
@@ -1363,7 +1370,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
             info->kind = STATS_EXT_MCV;
             info->keys = bms_copy(keys);
-            stainfos = lcons(info, stainfos);
+            stainfos = lappend(stainfos, info);
         }
         ReleaseSysCache(htup);
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 19e3164..354030e 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1132,7 +1132,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
         if (expr == NULL)
             continue;            /* probably cannot happen */
-        groupClauses = lcons(expr, groupClauses);
+        groupClauses = lappend(groupClauses, expr);
     }
     /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 66449b8..7eba59e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3201,7 +3201,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
          * Split the list of varinfos in two - one for the current rel, one
          * for remaining Vars on other rels.
          */
-        relvarinfos = lcons(varinfo1, relvarinfos);
+        relvarinfos = lappend(relvarinfos, varinfo1);
         for_each_cell(l, varinfos, list_second_cell(varinfos))
         {
             GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);
@@ -3209,12 +3209,12 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
             if (varinfo2->rel == varinfo1->rel)
             {
                 /* varinfos on current rel */
-                relvarinfos = lcons(varinfo2, relvarinfos);
+                relvarinfos = lappend(relvarinfos, varinfo2);
             }
             else
             {
                 /* not time to process varinfo2 yet */
-                newvarinfos = lcons(varinfo2, newvarinfos);
+                newvarinfos = lappend(newvarinfos, varinfo2);
             }
         }
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index ef8eec3..be4ddf8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2030,10 +2030,10 @@ NOTICE:  avg_transfn called with 3
 -- this should not share the state due to different input columns.
 select my_avg(one),my_sum(two) from (values(1,2),(3,4)) t(one,two);
-NOTICE:  avg_transfn called with 2
 NOTICE:  avg_transfn called with 1
-NOTICE:  avg_transfn called with 4
+NOTICE:  avg_transfn called with 2
 NOTICE:  avg_transfn called with 3
+NOTICE:  avg_transfn called with 4
  my_avg | my_sum
 --------+--------
       2 |      6
			
		On Wed, 17 Jul 2019 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 0002 changes some additional places where it's maybe a bit less safe, > ie there's a potential for user-visible behavioral change because > processing will occur in a different order. In particular, the proposed > change in execExpr.c causes aggregates and window functions that are in > the same plan node to be executed in a different order than before --- > but it seems to me that this order is saner. (Note the change in the > expected regression results, in a test that's intentionally sensitive to > the execution order.) And anyway when did we guarantee anything about > that? I've only looked at 0002. Here are my thoughts: get_tables_to_cluster: Looks fine. It's a heap scan. Any previous order was accidental, so if it causes issues then we might need to think of using a more well-defined order for CLUSTER; get_rels_with_domain: This is a static function. Changing the order of the list seems to only really affect the error message that a failed domain constraint validation could emit. Perhaps this might break someone else's tests, but they should just be able to update their expected results. ExecInitExprRec: As you mention, the order of aggregate evaluation is reversed. I agree that the new order is saner. I can't think why we'd be doing it in backwards order beforehand. get_relation_statistics: RelationGetStatExtList does not seem to pay much attention to the order it returns its results, so I don't think the order we apply extended statistics was that well defined before. We always attempt to use the stats with the most matching columns in choose_best_statistics(), so I think for people to be affected they'd either multiple stats with the same sets of columns or a complex clause that equally well matches two sets of stats, and in that case the other columns would be matched to the other stats later... I'd better check that... erm... actually that's not true. I see statext_mcv_clauselist_selectivity() makes no attempt to match the clause list to another set of stats after finding the first best match. I think it likely should do that. estimate_multivariate_ndistinct() seems to have an XXX comment mentioning thoughts about the stability of which stats are used, but nothing is done. parseCheckAggregates: I can't see any user-visible change to this one. Not even in error messages. estimate_num_groups: Similar to get_relation_statistics(), I see that estimate_multivariate_ndistinct() is only called once and we don't attempt to match up the remaining clauses with more stats. I can't imagine swapping lcons for lappend here will upset anyone. The behaviour does not look well defined already. I think we should likely change the "if (estimate_multivariate_ndistinct(root, rel, &relvarinfos," to "while ...", then drop the else. Not for this patch though... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On 17 Jul 2019, at 01:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There are a bunch of places that are using list_delete_first to remove
> the next-to-process entry from a List used as a queue.  In principle,
> we could invert the order of those queues and then use list_delete_last,
> but I thought this would probably be too confusing: it's natural to
> think of the front of the list as being the head of the queue.  I doubt
> that any of those queues get long enough for it to be a serious
> performance problem to leave them as-is.
For cases where an Oid list is copied and then head elements immediately
removed, as in fetch_search_path, couldn’t we instead use a counter and
list_copy_tail to avoid repeated list_delete_first calls?  Something like the
attached poc.
> +List *
> +list_delete_last(List *list)
> +{
> +    check_list_invariants(list);
> +
> +    if (list == NIL)
> +        return NIL;                /* would an error be better? */
Since we’ve allowed list_delete_first on NIL for a long time, it seems
reasonable to do the same for list_delete_last even though it’s hard to come up
with a good usecase for deleting the last element without inspecting the list
(a stack growing from the bottom perhaps?).  It reads better to check for NIL
before calling check_list_invariants though IMO.
Looking mainly at 0001 for now, I agree that the order is insignificant.
cheers ./daniel
			
		Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've only looked at 0002. Here are my thoughts:
Thanks for looking!
> get_tables_to_cluster:
> Looks fine. It's a heap scan. Any previous order was accidental, so if
> it causes issues then we might need to think of using a more
> well-defined order for CLUSTER;
Check.
> get_rels_with_domain:
> This is a static function. Changing the order of the list seems to
> only really affect the error message that a failed domain constraint
> validation could emit. Perhaps this might break someone else's tests,
> but they should just be able to update their expected results.
Also, this is already dependent on the order of pg_depend entries,
so it's not terribly stable anyhow.
> get_relation_statistics:
> RelationGetStatExtList does not seem to pay much attention to the
> order it returns its results, so I don't think the order we apply
> extended statistics was that well defined before. We always attempt to
> use the stats with the most matching columns in
> choose_best_statistics(), so I think
> for people to be affected they'd either multiple stats with the same
> sets of columns or a complex clause that equally well matches two sets
> of stats, and in that case the other columns would be matched to the
> other stats later... I'd better check that... erm... actually that's
> not true. I see statext_mcv_clauselist_selectivity() makes no attempt
> to match the clause list to another set of stats after finding the
> first best match. I think it likely should do that.
> estimate_multivariate_ndistinct() seems to have an XXX comment
> mentioning thoughts about the stability of which stats are used, but
> nothing is done.
I figured that (a) this hasn't been around so long that anybody's
expectations are frozen, and (b) if there is a meaningful difference in
results then it's probably incumbent on the extstats code to do better.
That seems to match your conclusions.  But I don't see any regression
test changes from making this change, so at least in simple cases it
doesn't matter.
(As you say, any extstats changes that we conclude are needed should
be a separate patch.)
            regards, tom lane
			
		Daniel Gustafsson <daniel@yesql.se> writes:
> For cases where an Oid list is copied and then head elements immediately
> removed, as in fetch_search_path, couldn’t we instead use a counter and
> list_copy_tail to avoid repeated list_delete_first calls?
Perhaps, but I'm having a hard time getting excited about it.
I don't think there's any evidence that fetch_search_path is a
performance issue.  Also, this coding requires that the *only*
changes be deletion of head elements, whereas as it stands,
once we've copied the list we can do what we like.
> Since we’ve allowed list_delete_first on NIL for a long time, it seems
> reasonable to do the same for list_delete_last even though it’s hard to come up
> with a good usecase for deleting the last element without inspecting the list
> (a stack growing from the bottom perhaps?
Yeah, I intentionally made the edge cases the same.  There's room to argue
that both functions should error out on NIL, instead.  I've not looked
into that though, and would consider it material for a separate patch.
> Looking mainly at 0001 for now, I agree that the order is insignificant.
Thanks for looking!
            regards, tom lane
			
		On Wed, 17 Jul 2019 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (Actually, I doubt that any of these changes will really move the > performance needle in the real world. It's more a case of wanting > the code to present good examples not bad ones.) In spirit with the above, I'd quite like to fix a small bad example that I ended up with in nodeAppend.c and nodeMergeappend.c for run-time partition pruning. The code in question performs a loop over a list and checks bms_is_member() on each element and only performs an action if the member is present in the Bitmapset. It would seem much more efficient just to perform a bms_next_member() type loop then just fetch the list item with list_nth(), at least this is certainly the case when only a small number of the list items are indexed by the Bitmapset. With these two loops in particular, when a large number of list items are in the set the cost of the work goes up greatly, so it does not seem unreasonable to optimise the case for when just a few match. A quick test shows that it's hardly groundbreaking performance-wise, but test 1 does seem measurable above the noise. -- Setup plan_cache_mode = force_generic_plan max_locks_per_transaction = 256 create table ht (a int primary key, b int, c int) partition by hash (a); select 'create table ht' || x::text || ' partition of ht for values with (modulus 8192, remainder ' || (x)::text || ');' from generate_series(0,8191) x; \gexec -- Test 1: Just one member in the Bitmapset. test1.sql: \set p 1 select * from ht where a = :p Master: $ pgbench -n -f test1.sql -T 60 -M prepared postgres tps = 297.267191 (excluding connections establishing) tps = 298.276797 (excluding connections establishing) tps = 296.264459 (excluding connections establishing) tps = 298.968037 (excluding connections establishing) tps = 298.575684 (excluding connections establishing) Patched: $ pgbench -n -f test1.sql -T 60 -M prepared postgres tps = 300.924254 (excluding connections establishing) tps = 299.360196 (excluding connections establishing) tps = 300.197024 (excluding connections establishing) tps = 299.741215 (excluding connections establishing) tps = 299.748088 (excluding connections establishing) 0.71% faster -- Test 2: when all list items are found in the Bitmapset. test2.sql: select * from ht; Master: $ pgbench -n -f test2.sql -T 60 -M prepared postgres tps = 12.526578 (excluding connections establishing) tps = 12.528046 (excluding connections establishing) tps = 12.491347 (excluding connections establishing) tps = 12.538292 (excluding connections establishing) tps = 12.528959 (excluding connections establishing) Patched: $ pgbench -n -f test2.sql -T 60 -M prepared postgres tps = 12.503670 (excluding connections establishing) tps = 12.516133 (excluding connections establishing) tps = 12.404925 (excluding connections establishing) tps = 12.514567 (excluding connections establishing) tps = 12.541484 (excluding connections establishing) 0.21% slower With that removed the slowness of test 1 is almost entirely in AcquireExecutorLocks() and ExecCheckRTPerms(). We'd be up close to about 30k tps instead of 300 tps if there was some solution to those problems. I think it makes sense to remove the inefficient loops and leave the just final two bottlenecks, in the meantime. Patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Wed, 17 Jul 2019 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (Actually, I doubt that any of these changes will really move the
>> performance needle in the real world.  It's more a case of wanting
>> the code to present good examples not bad ones.)
> In spirit with the above, I'd quite like to fix a small bad example
> that I ended up with in nodeAppend.c and nodeMergeappend.c for
> run-time partition pruning.
I didn't test the patch, but just by eyeball it looks sane,
and I concur it should win if the bitmap is sparse.
One small question is whether it loses if most of the subplans
are present in the bitmap.  I imagine that would be close enough
to break-even, but it might be worth trying to test to be sure.
(I'd think about breaking out just the loops in question and
testing them stand-alone, or else putting in an outer loop to
repeat them, since as you say the surrounding work probably
dominates.)
            regards, tom lane
			
		I wrote:
>> * Rationalize places that are using combinations of list_copy and
>> list_concat, probably by inventing an additional list-concatenation
>> primitive that modifies neither input.
> I poked around to see what we have in this department.  There seem to
> be several identifiable use-cases:
> [ ... analysis ... ]
Here's a proposed patch based on that.  I added list_concat_copy()
and then simplified callers as appropriate.
It turns out there are a *lot* of places where list_concat() callers
are now leaking the second input list (where before they just leaked
that list's header).  So I've got mixed emotions about the choice not
to add a variant function that list_free's the second input.  On the
other hand, the leakage probably amounts to nothing significant in
all or nearly all of these places, and I'm concerned about the
readability/understandability loss of having an extra version of
list_concat.  Anybody have an opinion about that?
Other than that point, I think this is pretty much good to go.
            regards, tom lane
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 548ae66..50d1f18 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1497,7 +1497,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
             if (fpinfo->jointype == JOIN_INNER)
             {
                 *ignore_conds = list_concat(*ignore_conds,
-                                            list_copy(fpinfo->joinclauses));
+                                            fpinfo->joinclauses);
                 fpinfo->joinclauses = NIL;
             }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2..b9f90e9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2667,8 +2667,8 @@ estimate_path_cost_size(PlannerInfo *root,
          * baserestrictinfo plus any extra join_conds relevant to this
          * particular path.
          */
-        remote_conds = list_concat(list_copy(remote_param_join_conds),
-                                   fpinfo->remote_conds);
+        remote_conds = list_concat_copy(remote_param_join_conds,
+                                        fpinfo->remote_conds);
         /*
          * Construct EXPLAIN query including the desired SELECT, FROM, and
@@ -5102,23 +5102,23 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
     {
         case JOIN_INNER:
             fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-                                               list_copy(fpinfo_i->remote_conds));
+                                               fpinfo_i->remote_conds);
             fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-                                               list_copy(fpinfo_o->remote_conds));
+                                               fpinfo_o->remote_conds);
             break;
         case JOIN_LEFT:
             fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
-                                              list_copy(fpinfo_i->remote_conds));
+                                              fpinfo_i->remote_conds);
             fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-                                               list_copy(fpinfo_o->remote_conds));
+                                               fpinfo_o->remote_conds);
             break;
         case JOIN_RIGHT:
             fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
-                                              list_copy(fpinfo_o->remote_conds));
+                                              fpinfo_o->remote_conds);
             fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-                                               list_copy(fpinfo_i->remote_conds));
+                                               fpinfo_i->remote_conds);
             break;
         case JOIN_FULL:
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fd29927..e0af665 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -526,8 +526,8 @@ DefineIndex(Oid relationId,
      * is part of the key columns, and anything equal to and over is part of
      * the INCLUDE columns.
      */
-    allIndexParams = list_concat(list_copy(stmt->indexParams),
-                                 list_copy(stmt->indexIncludingParams));
+    allIndexParams = list_concat_copy(stmt->indexParams,
+                                      stmt->indexIncludingParams);
     numberOfAttributes = list_length(allIndexParams);
     if (numberOfAttributes <= 0)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 64a9e58..83337c2 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -719,8 +719,7 @@ init_sql_fcache(FmgrInfo *finfo, Oid collation, bool lazyEvalOK)
                                                           fcache->pinfo,
                                                           NULL);
         queryTree_list = lappend(queryTree_list, queryTree_sublist);
-        flat_query_list = list_concat(flat_query_list,
-                                      list_copy(queryTree_sublist));
+        flat_query_list = list_concat(flat_query_list, queryTree_sublist);
     }
     check_sql_fn_statements(flat_query_list);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 9163464..6bf13ae 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -501,12 +501,15 @@ lcons_oid(Oid datum, List *list)
 }
 /*
- * Concatenate list2 to the end of list1, and return list1. list1 is
- * destructively changed, list2 is not. (However, in the case of pointer
- * lists, list1 and list2 will point to the same structures.) Callers
- * should be sure to use the return value as the new pointer to the
- * concatenated list: the 'list1' input pointer may or may not be the
- * same as the returned pointer.
+ * Concatenate list2 to the end of list1, and return list1.
+ *
+ * This is equivalent to lappend'ing each element of list2, in order, to list1.
+ * list1 is destructively changed, list2 is not.  (However, in the case of
+ * pointer lists, list1 and list2 will point to the same structures.)
+ *
+ * Callers should be sure to use the return value as the new pointer to the
+ * concatenated list: the 'list1' input pointer may or may not be the same
+ * as the returned pointer.
  */
 List *
 list_concat(List *list1, const List *list2)
@@ -535,6 +538,41 @@ list_concat(List *list1, const List *list2)
 }
 /*
+ * Form a new list by concatenating the elements of list1 and list2.
+ *
+ * Neither input list is modified.  (However, if they are pointer lists,
+ * the output list will point to the same structures.)
+ *
+ * This is equivalent to, but more efficient than,
+ * list_concat(list_copy(list1), list2).
+ * Note that some pre-v13 code might list_copy list2 as well, but that's
+ * pointless now.
+ */
+List *
+list_concat_copy(const List *list1, const List *list2)
+{
+    List       *result;
+    int            new_len;
+
+    if (list1 == NIL)
+        return list_copy(list2);
+    if (list2 == NIL)
+        return list_copy(list1);
+
+    Assert(list1->type == list2->type);
+
+    new_len = list1->length + list2->length;
+    result = new_list(list1->type, new_len);
+    memcpy(result->elements, list1->elements,
+           list1->length * sizeof(ListCell));
+    memcpy(result->elements + list1->length, list2->elements,
+           list2->length * sizeof(ListCell));
+
+    check_list_invariants(result);
+    return result;
+}
+
+/*
  * Truncate 'list' to contain no more than 'new_size' elements. This
  * modifies the list in-place! Despite this, callers should use the
  * pointer returned by this function to refer to the newly truncated
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e9ee32b..db3a68a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1266,7 +1266,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
         if (rel->part_scheme)
             rel->partitioned_child_rels =
                 list_concat(rel->partitioned_child_rels,
-                            list_copy(childrel->partitioned_child_rels));
+                            childrel->partitioned_child_rels);
         /*
          * Child is live, so add it to the live_childrels list for use below.
@@ -1347,9 +1347,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
                 component = root->simple_rel_array[relid];
                 Assert(component->part_scheme != NULL);
                 Assert(list_length(component->partitioned_child_rels) >= 1);
-                partrels =
-                    list_concat(partrels,
-                                list_copy(component->partitioned_child_rels));
+                partrels = list_concat(partrels,
+                                       component->partitioned_child_rels);
             }
             partitioned_rels = list_make1(partrels);
@@ -2048,8 +2047,7 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
         if (!apath->path.parallel_aware || apath->first_partial_path == 0)
         {
-            /* list_copy is important here to avoid sharing list substructure */
-            *subpaths = list_concat(*subpaths, list_copy(apath->subpaths));
+            *subpaths = list_concat(*subpaths, apath->subpaths);
             return;
         }
         else if (special_subpaths != NULL)
@@ -2072,8 +2070,7 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
     {
         MergeAppendPath *mpath = (MergeAppendPath *) path;
-        /* list_copy is important here to avoid sharing list substructure */
-        *subpaths = list_concat(*subpaths, list_copy(mpath->subpaths));
+        *subpaths = list_concat(*subpaths, mpath->subpaths);
         return;
     }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3a9a994..bc6bc99 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4443,8 +4443,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
      * restriction clauses.  Note that we force the clauses to be treated as
      * non-join clauses during selectivity estimation.
      */
-    allclauses = list_concat(list_copy(param_clauses),
-                             rel->baserestrictinfo);
+    allclauses = list_concat_copy(param_clauses, rel->baserestrictinfo);
     nrows = rel->tuples *
         clauselist_selectivity(root,
                                allclauses,
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f339fd..37b257c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -656,7 +656,7 @@ get_join_index_paths(PlannerInfo *root, RelOptInfo *rel,
             }
         }
-        /* Add restriction clauses (this is nondestructive to rclauseset) */
+        /* Add restriction clauses */
         clauseset.indexclauses[indexcol] =
             list_concat(clauseset.indexclauses[indexcol],
                         rclauseset->indexclauses[indexcol]);
@@ -1204,8 +1204,7 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
             {
                 /* Form all_clauses if not done already */
                 if (all_clauses == NIL)
-                    all_clauses = list_concat(list_copy(clauses),
-                                              other_clauses);
+                    all_clauses = list_concat_copy(clauses, other_clauses);
                 if (!predicate_implied_by(index->indpred, all_clauses, false))
                     continue;    /* can't use it at all */
@@ -1270,7 +1269,7 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
      * We can use both the current and other clauses as context for
      * build_paths_for_OR; no need to remove ORs from the lists.
      */
-    all_clauses = list_concat(list_copy(clauses), other_clauses);
+    all_clauses = list_concat_copy(clauses, other_clauses);
     foreach(lc, clauses)
     {
@@ -1506,8 +1505,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
         pathinfo = pathinfoarray[i];
         paths = list_make1(pathinfo->path);
         costsofar = bitmap_scan_cost_est(root, rel, pathinfo->path);
-        qualsofar = list_concat(list_copy(pathinfo->quals),
-                                list_copy(pathinfo->preds));
+        qualsofar = list_concat_copy(pathinfo->quals, pathinfo->preds);
         clauseidsofar = bms_copy(pathinfo->clauseids);
         for (j = i + 1; j < npaths; j++)
@@ -1543,10 +1541,8 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
             {
                 /* keep new path in paths, update subsidiary variables */
                 costsofar = newcost;
-                qualsofar = list_concat(qualsofar,
-                                        list_copy(pathinfo->quals));
-                qualsofar = list_concat(qualsofar,
-                                        list_copy(pathinfo->preds));
+                qualsofar = list_concat(qualsofar, pathinfo->quals);
+                qualsofar = list_concat(qualsofar, pathinfo->preds);
                 clauseidsofar = bms_add_members(clauseidsofar,
                                                 pathinfo->clauseids);
             }
@@ -1849,7 +1845,7 @@ find_indexpath_quals(Path *bitmapqual, List **quals, List **preds)
             *quals = lappend(*quals, iclause->rinfo->clause);
         }
-        *preds = list_concat(*preds, list_copy(ipath->indexinfo->indpred));
+        *preds = list_concat(*preds, ipath->indexinfo->indpred);
     }
     else
         elog(ERROR, "unrecognized node type: %d", nodeTag(bitmapqual));
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c6b8553..355b03f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -556,8 +556,8 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
      * For paranoia's sake, don't modify the stored baserestrictinfo list.
      */
     if (best_path->param_info)
-        scan_clauses = list_concat(list_copy(scan_clauses),
-                                   best_path->param_info->ppi_clauses);
+        scan_clauses = list_concat_copy(scan_clauses,
+                                        best_path->param_info->ppi_clauses);
     /*
      * Detect whether we have any pseudoconstant quals to deal with.  Then, if
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 73da0c2..274fea0 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -973,7 +973,6 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
                 *postponed_qual_list = lappend(*postponed_qual_list, pq);
             }
         }
-        /* list_concat is nondestructive of its second argument */
         my_quals = list_concat(my_quals, (List *) j->quals);
         /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 36fefd9..6de182c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3567,10 +3567,6 @@ reorder_grouping_sets(List *groupingsets, List *sortclause)
             }
         }
-        /*
-         * Safe to use list_concat (which shares cells of the second arg)
-         * because we know that new_elems does not share cells with anything.
-         */
         previous = list_concat(previous, new_elems);
         gs->set = list_copy(previous);
@@ -4282,8 +4278,8 @@ consider_groupingsets_paths(PlannerInfo *root,
              */
             if (!rollup->hashable)
                 return;
-            else
-                sets_data = list_concat(sets_data, list_copy(rollup->gsets_data));
+
+            sets_data = list_concat(sets_data, rollup->gsets_data);
         }
         foreach(lc, sets_data)
         {
@@ -4468,7 +4464,7 @@ consider_groupingsets_paths(PlannerInfo *root,
                     {
                         if (bms_is_member(i, hash_items))
                             hash_sets = list_concat(hash_sets,
-                                                    list_copy(rollup->gsets_data));
+                                                    rollup->gsets_data);
                         else
                             rollups = lappend(rollups, rollup);
                         ++i;
@@ -5637,8 +5633,7 @@ make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
                  errdetail("Window ordering columns must be of sortable datatypes.")));
     /* Okay, make the combined pathkeys */
-    window_sortclauses = list_concat(list_copy(wc->partitionClause),
-                                     list_copy(wc->orderClause));
+    window_sortclauses = list_concat_copy(wc->partitionClause, wc->orderClause);
     window_pathkeys = make_pathkeys_for_sortclauses(root,
                                                     window_sortclauses,
                                                     tlist);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dc11f09..85228e9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -889,7 +889,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                 splan->resultRelIndex = list_length(root->glob->resultRelations);
                 root->glob->resultRelations =
                     list_concat(root->glob->resultRelations,
-                                list_copy(splan->resultRelations));
+                                splan->resultRelations);
                 /*
                  * If the main target relation is a partitioned table, also
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 4fbc03f..3c0ddd6 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -2507,7 +2507,6 @@ reduce_outer_joins_pass2(Node *jtnode,
         pass_nonnullable_rels = find_nonnullable_rels(f->quals);
         pass_nonnullable_rels = bms_add_members(pass_nonnullable_rels,
                                                 nonnullable_rels);
-        /* NB: we rely on list_concat to not damage its second argument */
         pass_nonnullable_vars = find_nonnullable_vars(f->quals);
         pass_nonnullable_vars = list_concat(pass_nonnullable_vars,
                                             nonnullable_vars);
diff --git a/src/backend/optimizer/prep/prepqual.c b/src/backend/optimizer/prep/prepqual.c
index e9a9497..ee91957 100644
--- a/src/backend/optimizer/prep/prepqual.c
+++ b/src/backend/optimizer/prep/prepqual.c
@@ -328,12 +328,6 @@ pull_ands(List *andlist)
     {
         Node       *subexpr = (Node *) lfirst(arg);
-        /*
-         * Note: we can destructively concat the subexpression's arglist
-         * because we know the recursive invocation of pull_ands will have
-         * built a new arglist not shared with any other expr. Otherwise we'd
-         * need a list_copy here.
-         */
         if (is_andclause(subexpr))
             out_list = list_concat(out_list,
                                    pull_ands(((BoolExpr *) subexpr)->args));
@@ -360,12 +354,6 @@ pull_ors(List *orlist)
     {
         Node       *subexpr = (Node *) lfirst(arg);
-        /*
-         * Note: we can destructively concat the subexpression's arglist
-         * because we know the recursive invocation of pull_ors will have
-         * built a new arglist not shared with any other expr. Otherwise we'd
-         * need a list_copy here.
-         */
         if (is_orclause(subexpr))
             out_list = list_concat(out_list,
                                    pull_ors(((BoolExpr *) subexpr)->args));
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 99dbf8d..1e465b2 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -1009,8 +1009,8 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
             max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
             return true;
         save_safe_param_ids = context->safe_param_ids;
-        context->safe_param_ids = list_concat(list_copy(subplan->paramIds),
-                                              context->safe_param_ids);
+        context->safe_param_ids = list_concat_copy(context->safe_param_ids,
+                                                   subplan->paramIds);
         if (max_parallel_hazard_walker(subplan->testexpr, context))
             return true;        /* no need to restore safe_param_ids */
         list_free(context->safe_param_ids);
@@ -3697,18 +3697,12 @@ simplify_or_arguments(List *args,
         /* flatten nested ORs as per above comment */
         if (is_orclause(arg))
         {
-            List       *subargs = list_copy(((BoolExpr *) arg)->args);
+            List       *subargs = ((BoolExpr *) arg)->args;
+            List       *oldlist = unprocessed_args;
-            /* overly tense code to avoid leaking unused list header */
-            if (!unprocessed_args)
-                unprocessed_args = subargs;
-            else
-            {
-                List       *oldhdr = unprocessed_args;
-
-                unprocessed_args = list_concat(subargs, unprocessed_args);
-                pfree(oldhdr);
-            }
+            unprocessed_args = list_concat_copy(subargs, unprocessed_args);
+            /* perhaps-overly-tense code to avoid leaking old lists */
+            list_free(oldlist);
             continue;
         }
@@ -3718,14 +3712,14 @@ simplify_or_arguments(List *args,
         /*
          * It is unlikely but not impossible for simplification of a non-OR
          * clause to produce an OR.  Recheck, but don't be too tense about it
-         * since it's not a mainstream case. In particular we don't worry
-         * about const-simplifying the input twice.
+         * since it's not a mainstream case.  In particular we don't worry
+         * about const-simplifying the input twice, nor about list leakage.
          */
         if (is_orclause(arg))
         {
-            List       *subargs = list_copy(((BoolExpr *) arg)->args);
+            List       *subargs = ((BoolExpr *) arg)->args;
-            unprocessed_args = list_concat(subargs, unprocessed_args);
+            unprocessed_args = list_concat_copy(subargs, unprocessed_args);
             continue;
         }
@@ -3799,18 +3793,12 @@ simplify_and_arguments(List *args,
         /* flatten nested ANDs as per above comment */
         if (is_andclause(arg))
         {
-            List       *subargs = list_copy(((BoolExpr *) arg)->args);
+            List       *subargs = ((BoolExpr *) arg)->args;
+            List       *oldlist = unprocessed_args;
-            /* overly tense code to avoid leaking unused list header */
-            if (!unprocessed_args)
-                unprocessed_args = subargs;
-            else
-            {
-                List       *oldhdr = unprocessed_args;
-
-                unprocessed_args = list_concat(subargs, unprocessed_args);
-                pfree(oldhdr);
-            }
+            unprocessed_args = list_concat_copy(subargs, unprocessed_args);
+            /* perhaps-overly-tense code to avoid leaking old lists */
+            list_free(oldlist);
             continue;
         }
@@ -3820,14 +3808,14 @@ simplify_and_arguments(List *args,
         /*
          * It is unlikely but not impossible for simplification of a non-AND
          * clause to produce an AND.  Recheck, but don't be too tense about it
-         * since it's not a mainstream case. In particular we don't worry
-         * about const-simplifying the input twice.
+         * since it's not a mainstream case.  In particular we don't worry
+         * about const-simplifying the input twice, nor about list leakage.
          */
         if (is_andclause(arg))
         {
-            List       *subargs = list_copy(((BoolExpr *) arg)->args);
+            List       *subargs = ((BoolExpr *) arg)->args;
-            unprocessed_args = list_concat(subargs, unprocessed_args);
+            unprocessed_args = list_concat_copy(subargs, unprocessed_args);
             continue;
         }
@@ -4188,7 +4176,7 @@ add_function_defaults(List *args, HeapTuple func_tuple)
         defaults = list_copy_tail(defaults, ndelete);
     /* And form the combined argument list, not modifying the input list */
-    return list_concat(list_copy(args), defaults);
+    return list_concat_copy(args, defaults);
 }
 /*
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 18ebc51..412a396 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -236,7 +236,7 @@ extract_or_clause(RestrictInfo *or_rinfo, RelOptInfo *rel)
         subclause = (Node *) make_ands_explicit(subclauses);
         if (is_orclause(subclause))
             clauselist = list_concat(clauselist,
-                                     list_copy(((BoolExpr *) subclause)->args));
+                                     ((BoolExpr *) subclause)->args);
         else
             clauselist = lappend(clauselist, subclause);
     }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 37d228c..5a4d696 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1715,43 +1715,39 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
      */
     for (cnt = 0; cnt < partnatts; cnt++)
     {
-        List       *outer_expr;
-        List       *outer_null_expr;
-        List       *inner_expr;
-        List       *inner_null_expr;
+        /* mark these const to enforce that we copy them properly */
+        const List *outer_expr = outer_rel->partexprs[cnt];
+        const List *outer_null_expr = outer_rel->nullable_partexprs[cnt];
+        const List *inner_expr = inner_rel->partexprs[cnt];
+        const List *inner_null_expr = inner_rel->nullable_partexprs[cnt];
         List       *partexpr = NIL;
         List       *nullable_partexpr = NIL;
-        outer_expr = list_copy(outer_rel->partexprs[cnt]);
-        outer_null_expr = list_copy(outer_rel->nullable_partexprs[cnt]);
-        inner_expr = list_copy(inner_rel->partexprs[cnt]);
-        inner_null_expr = list_copy(inner_rel->nullable_partexprs[cnt]);
-
         switch (jointype)
         {
             case JOIN_INNER:
-                partexpr = list_concat(outer_expr, inner_expr);
-                nullable_partexpr = list_concat(outer_null_expr,
-                                                inner_null_expr);
+                partexpr = list_concat_copy(outer_expr, inner_expr);
+                nullable_partexpr = list_concat_copy(outer_null_expr,
+                                                     inner_null_expr);
                 break;
             case JOIN_SEMI:
             case JOIN_ANTI:
-                partexpr = outer_expr;
-                nullable_partexpr = outer_null_expr;
+                partexpr = list_copy(outer_expr);
+                nullable_partexpr = list_copy(outer_null_expr);
                 break;
             case JOIN_LEFT:
-                partexpr = outer_expr;
-                nullable_partexpr = list_concat(inner_expr,
-                                                outer_null_expr);
+                partexpr = list_copy(outer_expr);
+                nullable_partexpr = list_concat_copy(inner_expr,
+                                                     outer_null_expr);
                 nullable_partexpr = list_concat(nullable_partexpr,
                                                 inner_null_expr);
                 break;
             case JOIN_FULL:
-                nullable_partexpr = list_concat(outer_expr,
-                                                inner_expr);
+                nullable_partexpr = list_concat_copy(outer_expr,
+                                                     inner_expr);
                 nullable_partexpr = list_concat(nullable_partexpr,
                                                 outer_null_expr);
                 nullable_partexpr = list_concat(nullable_partexpr,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 7ccb10e..d75796a 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -665,9 +665,8 @@ make_tlist_from_pathtarget(PathTarget *target)
  * copy_pathtarget
  *      Copy a PathTarget.
  *
- * The new PathTarget has its own List cells, but shares the underlying
- * target expression trees with the old one.  We duplicate the List cells
- * so that items can be added to one target without damaging the other.
+ * The new PathTarget has its own exprs List, but shares the underlying
+ * target expression trees with the old one.
  */
 PathTarget *
 copy_pathtarget(PathTarget *src)
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 354030e..f418c61 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1649,9 +1649,8 @@ expand_groupingset_node(GroupingSet *gs)
                         Assert(gs_current->kind == GROUPING_SET_SIMPLE);
-                        current_result
-                            = list_concat(current_result,
-                                          list_copy(gs_current->content));
+                        current_result = list_concat(current_result,
+                                                     gs_current->content);
                         /* If we are done with making the current group, break */
                         if (--i == 0)
@@ -1691,11 +1690,8 @@ expand_groupingset_node(GroupingSet *gs)
                         Assert(gs_current->kind == GROUPING_SET_SIMPLE);
                         if (mask & i)
-                        {
-                            current_result
-                                = list_concat(current_result,
-                                              list_copy(gs_current->content));
-                        }
+                            current_result = list_concat(current_result,
+                                                         gs_current->content);
                         mask <<= 1;
                     }
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 2a6b2ff..260ccd4 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1214,9 +1214,6 @@ transformFromClauseItem(ParseState *pstate, Node *n,
          *
          * Notice that we don't require the merged namespace list to be
          * conflict-free.  See the comments for scanNameSpaceForRefname().
-         *
-         * NB: this coding relies on the fact that list_concat is not
-         * destructive to its second argument.
          */
         lateral_ok = (j->jointype == JOIN_INNER || j->jointype == JOIN_LEFT);
         setNamespaceLateralState(l_namespace, true, lateral_ok);
@@ -2116,9 +2113,7 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
                     if (IsA(n1, GroupingSet) &&
                         ((GroupingSet *) n1)->kind == GROUPING_SET_SETS)
-                    {
                         result_set = list_concat(result_set, (List *) n2);
-                    }
                     else
                         result_set = lappend(result_set, n2);
                 }
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6d3751d..b8efd6b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -642,8 +642,8 @@ gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
         if (rel->relid != 1)
             ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
-        /* Use list_copy to avoid modifying the passed-in List */
-        clauses = list_concat(list_copy(clauses), partqual);
+        /* Make a copy to avoid modifying the passed-in List */
+        clauses = list_concat_copy(clauses, partqual);
     }
     /* Down into the rabbit-hole. */
@@ -1471,7 +1471,7 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
                                                           pc->keyno,
                                                           NULL,
                                                           prefix);
-                        opsteps = list_concat(opsteps, list_copy(pc_steps));
+                        opsteps = list_concat(opsteps, pc_steps);
                     }
                 }
                 break;
@@ -1542,7 +1542,7 @@ gen_prune_steps_from_opexps(GeneratePruningStepsContext *context,
                                                    pc->keyno,
                                                    nullkeys,
                                                    prefix);
-                        opsteps = list_concat(opsteps, list_copy(pc_steps));
+                        opsteps = list_concat(opsteps, pc_steps);
                     }
                 }
                 break;
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 79c7c13..a21f7d3 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -865,8 +865,6 @@ SyncRepGetSyncStandbysPriority(bool *am_sync)
      */
     if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
     {
-        bool        needfree = (result != NIL && pending != NIL);
-
         /*
          * Set *am_sync to true if this walsender is in the pending list
          * because all pending standbys are considered as sync.
@@ -875,8 +873,7 @@ SyncRepGetSyncStandbysPriority(bool *am_sync)
             *am_sync = am_in_pending;
         result = list_concat(result, pending);
-        if (needfree)
-            pfree(pending);
+        list_free(pending);
         return result;
     }
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 93b6784..8249f71 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1041,11 +1041,11 @@ process_matched_tle(TargetEntry *src_tle,
             /* combine the two */
             memcpy(fstore, prior_expr, sizeof(FieldStore));
             fstore->newvals =
-                list_concat(list_copy(((FieldStore *) prior_expr)->newvals),
-                            list_copy(((FieldStore *) src_expr)->newvals));
+                list_concat_copy(((FieldStore *) prior_expr)->newvals,
+                                 ((FieldStore *) src_expr)->newvals);
             fstore->fieldnums =
-                list_concat(list_copy(((FieldStore *) prior_expr)->fieldnums),
-                            list_copy(((FieldStore *) src_expr)->fieldnums));
+                list_concat_copy(((FieldStore *) prior_expr)->fieldnums,
+                                 ((FieldStore *) src_expr)->fieldnums);
         }
         else
         {
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 4ca0ed2..5869ba5 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -10089,7 +10089,7 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
                             RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
                             List       *args = ((FuncExpr *) rtfunc->funcexpr)->args;
-                            allargs = list_concat(allargs, list_copy(args));
+                            allargs = list_concat(allargs, args);
                         }
                         appendStringInfoString(buf, "UNNEST(");
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7eba59e..1710129 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5768,7 +5768,6 @@ add_predicate_to_index_quals(IndexOptInfo *index, List *indexQuals)
         if (!predicate_implied_by(oneQual, indexQuals, false))
             predExtraQuals = list_concat(predExtraQuals, oneQual);
     }
-    /* list_concat avoids modifying the passed-in indexQuals list */
     return list_concat(predExtraQuals, indexQuals);
 }
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 1463408..409d840 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -519,6 +519,8 @@ extern List *lcons_int(int datum, List *list);
 extern List *lcons_oid(Oid datum, List *list);
 extern List *list_concat(List *list1, const List *list2);
+extern List *list_concat_copy(const List *list1, const List *list2);
+
 extern List *list_truncate(List *list, int new_size);
 extern bool list_member(const List *list, const void *datum);
			
		On Mon, 22 Jul 2019 at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One small question is whether it loses if most of the subplans
> are present in the bitmap.  I imagine that would be close enough
> to break-even, but it might be worth trying to test to be sure.
> (I'd think about breaking out just the loops in question and
> testing them stand-alone, or else putting in an outer loop to
> repeat them, since as you say the surrounding work probably
> dominates.)
My 2nd test was for when all subplans were present in the bitmap. It
did show a very slight slowdown for the case were all subplans were
present in the bitmapset. However, yeah, it seems like a good idea to
try it a million times to help show the true cost.
I did:
int x = 0;
/* Patched version */
for (j = 0; j < 1000000; j++)
{
    i = -1;
    while ((i = bms_next_member(validsubplans, i)) >= 0)
    {
        Plan    *initNode = (Plan *) list_nth(node->appendplans, i);
        x++;
    }
}
/* Master version */
for (j = 0; j < 1000000; j++)
{
    ListCell *lc;
    i = 0;
    foreach(lc, node->appendplans)
    {
        Plan    *initNode;
        if (bms_is_member(i, validsubplans))
       {
            initNode = (Plan *)lfirst(lc);
            x++;
        }
    }
}
elog(DEBUG1, "%d", x); /* stop the compiler optimizing away the loops */
I separately commented out each one of the outer loops away before
performing the test again.
plan_cache_mode = force_generic_plan
-- Test 1 (one matching subplan) --
prepare q1(int) as select * from ht where a = $1;
execute q1(1);
Master version:
Time: 14441.332 ms (00:14.441)
Time: 13829.744 ms (00:13.830)
Time: 13753.943 ms (00:13.754)
Patched version:
Time: 41.250 ms
Time: 40.976 ms
Time: 40.853 ms
-- Test 2 (all matching subplans (8192 of them)) --
prepare q2 as select * from ht;
execute q2;
Master version:
Time: 14825.304 ms (00:14.825)
Time: 14701.601 ms (00:14.702)
Time: 14650.969 ms (00:14.651)
Patched version:
Time: 44551.811 ms (00:44.552)
Time: 44357.915 ms (00:44.358)
Time: 43454.958 ms (00:43.455)
So the bms_next_member() loop is slower when the bitmapset is fully
populated with all subplans, but way faster when there's just 1
member.  In realiy, the ExecInitNode() call drowns most of it out.
Plus a plan with more subnodes is going take longer to execute and
then shutdown the plan after too.
-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
			
		David Rowley <david.rowley@2ndquadrant.com> writes:
> On Mon, 22 Jul 2019 at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One small question is whether it loses if most of the subplans
>> are present in the bitmap.  I imagine that would be close enough
>> to break-even, but it might be worth trying to test to be sure.
> ...
> -- Test 2 (all matching subplans (8192 of them)) --
> Master version:
> Time: 14825.304 ms (00:14.825)
> Time: 14701.601 ms (00:14.702)
> Time: 14650.969 ms (00:14.651)
> Patched version:
> Time: 44551.811 ms (00:44.552)
> Time: 44357.915 ms (00:44.358)
> Time: 43454.958 ms (00:43.455)
> So the bms_next_member() loop is slower when the bitmapset is fully
> populated with all subplans, but way faster when there's just 1
> member.
Interesting.  I wonder if bms_next_member() could be made any quicker?
Still, I agree that this is negligible compared to the actual work
needed per live subplan, and the fact that the cost scales per live
subplan is a good thing.  So no objection to this patch, but a mental
note to take another look at bms_next_member() someday.
            regards, tom lane
			
		On Mon, 22 Jul 2019 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > So the bms_next_member() loop is slower when the bitmapset is fully > > populated with all subplans, but way faster when there's just 1 > > member. > > Interesting. I wonder if bms_next_member() could be made any quicker? I had a quick look earlier and the only thing I saw was maybe to do the first loop differently from subsequent ones. The "w &= mask;" does nothing useful once we're past the first bitmapword that the loop touches. Not sure what the could would look like exactly yet, or how much it would help. I'll maybe experiment a bit later, but as separate work from the other patch. > Still, I agree that this is negligible compared to the actual work > needed per live subplan, and the fact that the cost scales per live > subplan is a good thing. So no objection to this patch, but a mental > note to take another look at bms_next_member() someday. Thanks for having a look. I'll have another look and will likely push this a bit later on today if all is well. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Mon, 22 Jul 2019 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Interesting.  I wonder if bms_next_member() could be made any quicker?
> I had a quick look earlier and the only thing I saw was maybe to do
> the first loop differently from subsequent ones.  The "w &= mask;"
> does nothing useful once we're past the first bitmapword that the loop
> touches.
Good thought, but it would only help when we're actually iterating to
later words, which happens just 1 out of 64 times in the fully-
populated-bitmap case.
Still, I think it might be worth pursuing to make the sparse-bitmap
case faster.
            regards, tom lane
			
		On Mon, 22 Jul 2019 at 08:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > >> * Rationalize places that are using combinations of list_copy and > >> list_concat, probably by inventing an additional list-concatenation > >> primitive that modifies neither input. > > > I poked around to see what we have in this department. There seem to > > be several identifiable use-cases: > > [ ... analysis ... ] > > Here's a proposed patch based on that. I added list_concat_copy() > and then simplified callers as appropriate. I looked over this and only noted down one thing: In estimate_path_cost_size, can you explain why list_concat_copy() is needed here? I don't see remote_param_join_conds being used after this, so might it be better to just get rid of remote_param_join_conds and pass remote_conds to classifyConditions(), then just list_concat()? /* * The complete list of remote conditions includes everything from * baserestrictinfo plus any extra join_conds relevant to this * particular path. */ remote_conds = list_concat_copy(remote_param_join_conds, fpinfo->remote_conds); classifyConditions() seems to create new lists, so it does not appear that you have to worry about modifying the existing lists. > It turns out there are a *lot* of places where list_concat() callers > are now leaking the second input list (where before they just leaked > that list's header). So I've got mixed emotions about the choice not > to add a variant function that list_free's the second input. On the > other hand, the leakage probably amounts to nothing significant in > all or nearly all of these places, and I'm concerned about the > readability/understandability loss of having an extra version of > list_concat. Anybody have an opinion about that? In some of these places, for example, the calls to generate_join_implied_equalities_normal() and generate_join_implied_equalities_broken(), I wonder, since these are static functions if we could just change the function signature to accept a List to append to. This could save us from having to perform any additional pallocs at all, so there'd be no need to free anything or worry about any leaks. The performance of the code would be improved too. There may be other cases where we can do similar, but I wouldn't vote we change signatures of non-static functions for that. If we do end up with another function, it might be nice to stay away from using "concat" in the name. I think we might struggle if there are too many variations on concat and there's a risk we'll use the wrong one. If we need this then perhaps something like list_append_all() might be a better choice... I'm struggling to build a strong opinion on this though. (I know that because I've deleted this paragraph 3 times and started again, each time with a different opinion.) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> I looked over this and only noted down one thing:
> In estimate_path_cost_size, can you explain why list_concat_copy() is
> needed here? I don't see remote_param_join_conds being used after
> this, so might it be better to just get rid of remote_param_join_conds
> and pass remote_conds to classifyConditions(), then just
> list_concat()?
Hm, you're right, remote_param_join_conds is not used after that,
so we could just drop the existing list_copy() and make it
        remote_conds = list_concat(remote_param_join_conds,
                                   fpinfo->remote_conds);
I'm disinclined to change the API of classifyConditions(),
if that's what you were suggesting.
>> It turns out there are a *lot* of places where list_concat() callers
>> are now leaking the second input list (where before they just leaked
>> that list's header).  So I've got mixed emotions about the choice not
>> to add a variant function that list_free's the second input.
> In some of these places, for example, the calls to
> generate_join_implied_equalities_normal() and
> generate_join_implied_equalities_broken(), I wonder, since these are
> static functions if we could just change the function signature to
> accept a List to append to.
I'm pretty disinclined to do that, too.  Complicating function APIs
for marginal performance gains isn't something that leads to
understandable or maintainable code.
> If we do end up with another function, it might be nice to stay away
> from using "concat" in the name. I think we might struggle if there
> are too many variations on concat and there's a risk we'll use the
> wrong one.  If we need this then perhaps something like
> list_append_all() might be a better choice... I'm struggling to build
> a strong opinion on this though. (I know that because I've deleted
> this paragraph 3 times and started again, each time with a different
> opinion.)
Yeah, the name is really the sticking point here; if we could think
of a name that was easy to understand then the whole thing would be
much easier to accept.  The best I've been able to come up with is
"list_join", by analogy to bms_join for bitmapsets ... but that's
not great.
            regards, tom lane
			
		On 2019-Jul-22, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > If we do end up with another function, it might be nice to stay away > > from using "concat" in the name. I think we might struggle if there > > are too many variations on concat and there's a risk we'll use the > > wrong one. If we need this then perhaps something like > > list_append_all() might be a better choice... I'm struggling to build > > a strong opinion on this though. (I know that because I've deleted > > this paragraph 3 times and started again, each time with a different > > opinion.) > > Yeah, the name is really the sticking point here; if we could think > of a name that was easy to understand then the whole thing would be > much easier to accept. The best I've been able to come up with is > "list_join", by analogy to bms_join for bitmapsets ... but that's > not great. So with this patch we end up with: list_union (copies list1, appends list2 element not already in list1) list_concat_unique (appends list2 elements not already in list) list_concat (appends all list2 elements) list_concat_copy (copies list1, appends all list2 elements) This seems a little random -- for example we end up with "union" being the same as "concat_copy" except for the copy; and the analogy between those two seems to exactly correspond to that between "concat_unique" and "concat". I would propose to use the name list_union, with flags being "unique" (or "uniquify" if that's a word, or even just "all" which seems obvious to people with a SQL background), and something that suggests "copy_first". Maybe we can offer a single name that does the four things, selecting the exact semantics with boolean flags? (We can provide the old names as macros, to avoid unnecessarily breaking other code). Also, perhaps it would make sense to put them all closer in the source file. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So with this patch we end up with:
> list_union (copies list1, appends list2 element not already in list1)
> list_concat_unique (appends list2 elements not already in list)
> list_concat (appends all list2 elements)
> list_concat_copy (copies list1, appends all list2 elements)
> This seems a little random -- for example we end up with "union" being
> the same as "concat_copy" except for the copy; and the analogy between
> those two seems to exactly correspond to that between "concat_unique"
> and "concat".
Yeah, list_concat_unique is kind of weird here.  Its header comment
even points out that it's much like list_union:
 * This is almost the same functionality as list_union(), but list1 is
 * modified in-place rather than being copied. However, callers of this
 * function may have strict ordering expectations -- i.e. that the relative
 * order of those list2 elements that are not duplicates is preserved.
I think that last sentence is bogus --- does anybody really think
people have been careful not to assume anything about the ordering
of list_union results?
> I would propose to use the name list_union, with flags
> being "unique" (or "uniquify" if that's a word, or even just "all" which
> seems obvious to people with a SQL background), and something that
> suggests "copy_first".
I really dislike using "union" for something that doesn't have the
same semantics as SQL's UNION (ie guaranteed duplicate elimination);
so I've never been that happy with "list_union" and "list_difference".
Propagating that into things that aren't doing any dup-elimination
at all seems very wrong.
Also, a big -1 for replacing these calls with something with
extra parameter(s).  That's going to be verbose, and not any
more readable, and probably slower because the called code
will have to figure out what to do.
Perhaps there's an argument for doing something to change the behavior
of list_union and list_difference and friends.  Not sure --- it could
be a foot-gun for back-patching.  I'm already worried about the risk
of back-patching code that assumes the new semantics of list_concat.
(Which might be a good argument for renaming it to something else?
Just not list_union, please.)
            regards, tom lane
			
		Hi,
I was just looking at the diff for a fix, which adds a "ListCell *lc;"
to function scope, even though it's only needed in a pretty narrow
scope.
Unfortunately foreach(ListCell *lc, ...) doesn't work with the current
definition. Which I think isn't great, because the large scopes for loop
iteration variables imo makes the code harder to reason about.
I wonder if we could either have a different version of foreach() that
allows that, or find a way to make the above work. For the latter I
don't immediately have a good idea of how to accomplish that. For the
former it's easy enough if we either don't include the typename (thereby
looking more alien), or if we reference the name separately (making it
more complicated to use).
I also wonder if a foreach version that includes the typical
(Type *) var = (Type *) lfirst(lc);
or
(Type *) var = castNode(Type, lfirst(lc));
or
OpExpr       *hclause = lfirst_node(OpExpr, lc);
would make it nicer to use lists.
foreach_node_in(Type, name, list) could mean something like
foreach(ListCell *name##_cell, list)
{
    Type* name = lfirst_node(Type, name##_cell);
}
(using a hypothetical foreach that supports defining the ListCell in
scope, just for display simplicity's sake).
Greetings,
Andres Freund
			
		Hi,
On 2019-07-31 15:57:56 -0700, Andres Freund wrote:
> I also wonder if a foreach version that includes the typical
> (Type *) var = (Type *) lfirst(lc);
> or
> (Type *) var = castNode(Type, lfirst(lc));
> or
> OpExpr       *hclause = lfirst_node(OpExpr, lc);
> 
> would make it nicer to use lists.
> 
> foreach_node_in(Type, name, list) could mean something like
> 
> foreach(ListCell *name##_cell, list)
> {
>     Type* name = lfirst_node(Type, name##_cell);
> }
s/lfirst/linitial/ of course. Was looking at code that also used
lfirst...
Reminds me that one advantage of macros like the second one would also
be to reduce the use of the confusingly named linitial*(), helping newer
hackers.
Greetings,
Andres Freund
			
		Hi,
On 2019-07-31 16:00:47 -0700, Andres Freund wrote:
> On 2019-07-31 15:57:56 -0700, Andres Freund wrote:
> > I also wonder if a foreach version that includes the typical
> > (Type *) var = (Type *) lfirst(lc);
> > or
> > (Type *) var = castNode(Type, lfirst(lc));
> > or
> > OpExpr       *hclause = lfirst_node(OpExpr, lc);
> > 
> > would make it nicer to use lists.
> > 
> > foreach_node_in(Type, name, list) could mean something like
> > 
> > foreach(ListCell *name##_cell, list)
> > {
> >     Type* name = lfirst_node(Type, name##_cell);
> > }
> 
> s/lfirst/linitial/ of course. Was looking at code that also used
> lfirst...
Bullshit, of course.
/me performs a tactical withdrawal into his brown paper bag.
> Reminds me that one advantage of macros like the second one would also
> be to reduce the use of the confusingly named linitial*(), helping newer
> hackers.
But that point just had two consecutive embarassing demonstrations...
- Andres
			
		On 01/08/2019 01:04, Andres Freund wrote:
> Hi,
> 
> On 2019-07-31 16:00:47 -0700, Andres Freund wrote:
>> On 2019-07-31 15:57:56 -0700, Andres Freund wrote:
>>> I also wonder if a foreach version that includes the typical
>>> (Type *) var = (Type *) lfirst(lc);
>>> or
>>> (Type *) var = castNode(Type, lfirst(lc));
>>> or
>>> OpExpr       *hclause = lfirst_node(OpExpr, lc);
>>>
>>> would make it nicer to use lists.
>>>
>>> foreach_node_in(Type, name, list) could mean something like
>>>
>>> foreach(ListCell *name##_cell, list)
>>> {
>>>      Type* name = lfirst_node(Type, name##_cell);
>>> }
>>
>> s/lfirst/linitial/ of course. Was looking at code that also used
>> lfirst...
> 
> Bullshit, of course.
> 
> /me performs a tactical withdrawal into his brown paper bag.
> 
> 
>> Reminds me that one advantage of macros like the second one would also
>> be to reduce the use of the confusingly named linitial*(), helping newer
>> hackers.
> 
> But that point just had two consecutive embarassing demonstrations...
> 
Yeah, pg_list.h is one file I never close.
-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/
			
		Andres Freund <andres@anarazel.de> writes:
> Unfortunately foreach(ListCell *lc, ...) doesn't work with the current
> definition. Which I think isn't great, because the large scopes for loop
> iteration variables imo makes the code harder to reason about.
Yeah, I tried to make that possible when I redid those macros, but
couldn't find a way :-(.  Even granting that we're willing to have
a different macro for this use-case, it doesn't seem easy, because
you can only put one <declaration> into the first element of a
for (;;).
That makes the other idea (of a foreach-ish macro declaring the
listcell value variable) problematic, too :-(.
One idea is that we could do something like
    foreach_variant(identifier, list_value)
    {
       type *v = (type *) lfirst_variant(identifier);
       ...
    }
where the "identifier" isn't actually a variable name but just something
we use to construct the ForEachState variable's name.  (The only reason
we need it is to avoid confusion in cases with nested foreach's.)  The
lfirst_variant macro would fetch the correct value just by looking
at the ForEachState, so there's no separate ListCell* variable at all.
            regards, tom lane
			
		I wrote:
> One idea is that we could do something like
>     foreach_variant(identifier, list_value)
>     {
>        type *v = (type *) lfirst_variant(identifier);
>        ...
>     }
> where the "identifier" isn't actually a variable name but just something
> we use to construct the ForEachState variable's name.  (The only reason
> we need it is to avoid confusion in cases with nested foreach's.)
On second thought, there seems no strong reason why you should need
to fetch the current value of a foreach-ish loop that's not the most
closely nested one.  So forget the dummy identifier, and consider
this straw-man proposal:
#define aforeach(list_value) ...
(I'm thinking "anonymous foreach", but bikeshedding welcome.)  This
is just like the current version of foreach(), except it uses a
fixed name for the ForEachState variable and doesn't attempt to
assign to a "cell" variable.
#define aforeach_current() ...
Retrieves the current value of the most-closely-nested aforeach
loop, based on knowing the fixed name of aforeach's loop variable.
This replaces "lfirst(lc)", and we'd also need aforeach_current_int()
and so on for the other variants of lfirst().
So usage would look like, say,
    aforeach(my_list)
    {
        type *my_value = (type *) aforeach_current();
        ...
    }
We'd also want aforeach_delete_current() and aforeach_current_index(),
to provide functionality equivalent to foreach_delete_current() and
foreach_current_index().
These names are a bit long, and maybe we should try to make them
shorter, but more shortness might also mean less clarity.
BTW, I think we could make equivalent macros in the old regime,
which would be a good thing because then it would be possible to
back-patch code using this notation.
Thoughts?
            regards, tom lane
			
		I wrote:
> BTW, I think we could make equivalent macros in the old regime,
> which would be a good thing because then it would be possible to
> back-patch code using this notation.
Oh, wait-a-second.  I was envisioning that
    for (ListCell *anonymous__lc = ...)
would work for that, but of course that requires C99, so we could
only put it into v12.
But that might still be worth doing.  It'd mean that the backpatchability
of this notation is the same as that of "for (int x = ...)", which
seems worth something.
            regards, tom lane
			
		Hi,
On 2019-07-31 19:40:09 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Unfortunately foreach(ListCell *lc, ...) doesn't work with the current
> > definition. Which I think isn't great, because the large scopes for loop
> > iteration variables imo makes the code harder to reason about.
>
> Yeah, I tried to make that possible when I redid those macros, but
> couldn't find a way :-(.  Even granting that we're willing to have
> a different macro for this use-case, it doesn't seem easy, because
> you can only put one <declaration> into the first element of a
> for (;;).
I remember hitting that at one point and annoyed/confused as there that
restriction came from. Probably some grammar difficulties. But still,
odd.
> That makes the other idea (of a foreach-ish macro declaring the
> listcell value variable) problematic, too :-(.
Hm. One way partially around that would be using an anonymous struct
inside the for(). Something like
#define foreach_node(membertype, name, lst)    \
for (struct {membertype *node; ListCell *lc; const List *l; int i;} name = {...}; \
     ...)
which then would allow code like
foreach_node(OpExpr, cur, list)
{
    do_something_with_node(cur.node);
    foreach_delete_current(cur);
}
That's quite similar to your:
> One idea is that we could do something like
>
>     foreach_variant(identifier, list_value)
>     {
>        type *v = (type *) lfirst_variant(identifier);
>        ...
>     }
>
> where the "identifier" isn't actually a variable name but just something
> we use to construct the ForEachState variable's name.  (The only reason
> we need it is to avoid confusion in cases with nested foreach's.)  The
> lfirst_variant macro would fetch the correct value just by looking
> at the ForEachState, so there's no separate ListCell* variable at all.
but would still allow to avoid the variable.
Greetings,
Andres Freund
			
		On Thu, 1 Aug 2019 at 07:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> Unfortunately foreach(ListCell *lc, ...) doesn't work with the current
> definition. Which I think isn't great, because the large scopes for loop
> iteration variables imo makes the code harder to reason about.
Totally agree.
you can only put one <declaration> into the first element of a
for (;;).
Use an anonymous block outer scope? Or if not permitted even by C99 (which I think it is), a do {...} while (0);  hack?
Hi,
On 2019-08-08 11:36:44 +0800, Craig Ringer wrote:
> > you can only put one <declaration> into the first element of a
> > for (;;).
> >
> 
> Use an anonymous block outer scope? Or if not permitted even by C99 (which
> I think it is), a do {...} while (0);  hack?
You can't easily - the problem is that there's no real way to add the
closing }, because that's after the macro.
Greetings,
Andres Freund
			
		On Thu, 8 Aug 2019 at 12:18, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-08-08 11:36:44 +0800, Craig Ringer wrote:
> > you can only put one <declaration> into the first element of a
> > for (;;).
> >
>
> Use an anonymous block outer scope? Or if not permitted even by C99 (which
> I think it is), a do {...} while (0); hack?
You can't easily - the problem is that there's no real way to add the
closing }, because that's after the macro.
Ah, right. Hence our 
PG_TRY();
{
}
PG_CATCH();
{
}
PG_END_TRY();
construct in all its beauty. 
I should've seen that.
[ returning to this topic now that the CF is over ]
I wrote:
> Perhaps there's an argument for doing something to change the behavior
> of list_union and list_difference and friends.  Not sure --- it could
> be a foot-gun for back-patching.  I'm already worried about the risk
> of back-patching code that assumes the new semantics of list_concat.
> (Which might be a good argument for renaming it to something else?
> Just not list_union, please.)
Has anyone got further thoughts about naming around list_concat
and friends?
If not, I'm inclined to go ahead with the concat-improvement patch as
proposed in [1], modulo the one improvement David spotted.
            regards, tom lane
[1] https://www.postgresql.org/message-id/6704.1563739305@sss.pgh.pa.us
			
		Andres Freund <andres@anarazel.de> writes:
> On 2019-07-31 19:40:09 -0400, Tom Lane wrote:
>> That makes the other idea (of a foreach-ish macro declaring the
>> listcell value variable) problematic, too :-(.
> Hm. One way partially around that would be using an anonymous struct
> inside the for(). Something like
> #define foreach_node(membertype, name, lst)    \
> for (struct {membertype *node; ListCell *lc; const List *l; int i;} name = {...}; \
>      ...)
> which then would allow code like
> foreach_node(OpExpr, cur, list)
> {
>     do_something_with_node(cur.node);
>     foreach_delete_current(cur);
> }
I'm hesitant to change the look of our loops quite that much, mainly
because it'll be a pain for back-patching.  If you write some code
for HEAD like this, and then have to back-patch it, you'll need to
insert/change significantly more code than if it's just a matter
of whether there's a ListCell variable or not.
I experimented with the "aforeach" idea I suggested upthread,
to the extent of writing the macros and then converting
parse_clause.c (a file chosen more or less at random) to use
aforeach instead of foreach.  I was somewhat surprised to find
that every single foreach() did convert pleasantly.  (There are
several forboth's that I didn't try to do anything with, though.)
If we do go in this direction, I wouldn't suggest trying to
actually do wholesale conversion of existing code like this;
that seems more likely to create back-patching land mines than
do anything helpful.  I am slightly tempted to try to convert
everyplace using foreach_delete_current, though, since those
loops are different from v12 already.
Thoughts?
            regards, tom lane
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 2a6b2ff..39d8d8e 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -117,8 +117,6 @@ static Node *transformFrameOffset(ParseState *pstate, int frameOptions,
 void
 transformFromClause(ParseState *pstate, List *frmList)
 {
-    ListCell   *fl;
-
     /*
      * The grammar will have produced a list of RangeVars, RangeSubselects,
      * RangeFunctions, and/or JoinExprs. Transform each one (possibly adding
@@ -128,9 +126,9 @@ transformFromClause(ParseState *pstate, List *frmList)
      * Note we must process the items left-to-right for proper handling of
      * LATERAL references.
      */
-    foreach(fl, frmList)
+    aforeach(frmList)
     {
-        Node       *n = lfirst(fl);
+        Node       *n = (Node *) aforeach_current();
         RangeTblEntry *rte;
         int            rtindex;
         List       *namespace;
@@ -267,11 +265,10 @@ extractRemainingColumns(List *common_colnames,
     {
         char       *colname = strVal(lfirst(lnames));
         bool        match = false;
-        ListCell   *cnames;
-        foreach(cnames, common_colnames)
+        aforeach(common_colnames)
         {
-            char       *ccolname = strVal(lfirst(cnames));
+            char       *ccolname = strVal(aforeach_current());
             if (strcmp(colname, ccolname) == 0)
             {
@@ -475,7 +472,6 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r)
     List       *coldeflists = NIL;
     bool        is_lateral;
     RangeTblEntry *rte;
-    ListCell   *lc;
     /*
      * We make lateral_only names of this level visible, whether or not the
@@ -505,9 +501,9 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r)
      * Likewise, collect column definition lists if there were any.  But
      * complain if we find one here and the RangeFunction has one too.
      */
-    foreach(lc, r->functions)
+    aforeach(r->functions)
     {
-        List       *pair = (List *) lfirst(lc);
+        List       *pair = aforeach_current_node(List);
         Node       *fexpr;
         List       *coldeflist;
         Node       *newfexpr;
@@ -551,11 +547,9 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r)
                 fc->over == NULL &&
                 coldeflist == NIL)
             {
-                ListCell   *lc;
-
-                foreach(lc, fc->args)
+                aforeach(fc->args)
                 {
-                    Node       *arg = (Node *) lfirst(lc);
+                    Node       *arg = (Node *) aforeach_current();
                     FuncCall   *newfc;
                     last_srf = pstate->p_last_srf;
@@ -700,7 +694,6 @@ transformRangeTableFunc(ParseState *pstate, RangeTableFunc *rtf)
     Oid            docType;
     RangeTblEntry *rte;
     bool        is_lateral;
-    ListCell   *col;
     char      **names;
     int            colno;
@@ -743,9 +736,9 @@ transformRangeTableFunc(ParseState *pstate, RangeTableFunc *rtf)
     names = palloc(sizeof(char *) * list_length(rtf->columns));
     colno = 0;
-    foreach(col, rtf->columns)
+    aforeach(rtf->columns)
     {
-        RangeTableFuncCol *rawc = (RangeTableFuncCol *) lfirst(col);
+        RangeTableFuncCol *rawc = aforeach_current_node(RangeTableFuncCol);
         Oid            typid;
         int32        typmod;
         Node       *colexpr;
@@ -837,15 +830,13 @@ transformRangeTableFunc(ParseState *pstate, RangeTableFunc *rtf)
     /* Namespaces, if any, also need to be transformed */
     if (rtf->namespaces != NIL)
     {
-        ListCell   *ns;
-        ListCell   *lc2;
         List       *ns_uris = NIL;
         List       *ns_names = NIL;
         bool        default_ns_seen = false;
-        foreach(ns, rtf->namespaces)
+        aforeach(rtf->namespaces)
         {
-            ResTarget  *r = (ResTarget *) lfirst(ns);
+            ResTarget  *r = aforeach_current_node(ResTarget);
             Node       *ns_uri;
             Assert(IsA(r, ResTarget));
@@ -858,9 +849,9 @@ transformRangeTableFunc(ParseState *pstate, RangeTableFunc *rtf)
             /* Verify consistency of name list: no dupes, only one DEFAULT */
             if (r->name != NULL)
             {
-                foreach(lc2, ns_names)
+                aforeach(ns_names)
                 {
-                    Value       *ns_node = (Value *) lfirst(lc2);
+                    Value       *ns_node = (Value *) aforeach_current();
                     if (ns_node == NULL)
                         continue;
@@ -1268,19 +1259,17 @@ transformFromClauseItem(ParseState *pstate, Node *n,
         if (j->isNatural)
         {
             List       *rlist = NIL;
-            ListCell   *lx,
-                       *rx;
             Assert(j->usingClause == NIL);    /* shouldn't have USING() too */
-            foreach(lx, l_colnames)
+            aforeach(l_colnames)
             {
-                char       *l_colname = strVal(lfirst(lx));
+                char       *l_colname = strVal(aforeach_current());
                 Value       *m_name = NULL;
-                foreach(rx, r_colnames)
+                aforeach(r_colnames)
                 {
-                    char       *r_colname = strVal(lfirst(rx));
+                    char       *r_colname = strVal(aforeach_current());
                     if (strcmp(l_colname, r_colname) == 0)
                     {
@@ -1313,24 +1302,21 @@ transformFromClauseItem(ParseState *pstate, Node *n,
             List       *ucols = j->usingClause;
             List       *l_usingvars = NIL;
             List       *r_usingvars = NIL;
-            ListCell   *ucol;
             Assert(j->quals == NULL);    /* shouldn't have ON() too */
-            foreach(ucol, ucols)
+            aforeach(ucols)
             {
-                char       *u_colname = strVal(lfirst(ucol));
-                ListCell   *col;
-                int            ndx;
+                char       *u_colname = strVal(aforeach_current());
                 int            l_index = -1;
                 int            r_index = -1;
                 Var           *l_colvar,
                            *r_colvar;
                 /* Check for USING(foo,foo) */
-                foreach(col, res_colnames)
+                aforeach(res_colnames)
                 {
-                    char       *res_colname = strVal(lfirst(col));
+                    char       *res_colname = strVal(aforeach_current());
                     if (strcmp(res_colname, u_colname) == 0)
                         ereport(ERROR,
@@ -1340,10 +1326,9 @@ transformFromClauseItem(ParseState *pstate, Node *n,
                 }
                 /* Find it in left input */
-                ndx = 0;
-                foreach(col, l_colnames)
+                aforeach(l_colnames)
                 {
-                    char       *l_colname = strVal(lfirst(col));
+                    char       *l_colname = strVal(aforeach_current());
                     if (strcmp(l_colname, u_colname) == 0)
                     {
@@ -1352,9 +1337,8 @@ transformFromClauseItem(ParseState *pstate, Node *n,
                                     (errcode(ERRCODE_AMBIGUOUS_COLUMN),
                                      errmsg("common column name \"%s\" appears more than once in left table",
                                             u_colname)));
-                        l_index = ndx;
+                        l_index = aforeach_current_index();
                     }
-                    ndx++;
                 }
                 if (l_index < 0)
                     ereport(ERROR,
@@ -1363,10 +1347,9 @@ transformFromClauseItem(ParseState *pstate, Node *n,
                                     u_colname)));
                 /* Find it in right input */
-                ndx = 0;
-                foreach(col, r_colnames)
+                aforeach(r_colnames)
                 {
-                    char       *r_colname = strVal(lfirst(col));
+                    char       *r_colname = strVal(aforeach_current());
                     if (strcmp(r_colname, u_colname) == 0)
                     {
@@ -1375,9 +1358,8 @@ transformFromClauseItem(ParseState *pstate, Node *n,
                                     (errcode(ERRCODE_AMBIGUOUS_COLUMN),
                                      errmsg("common column name \"%s\" appears more than once in right table",
                                             u_colname)));
-                        r_index = ndx;
+                        r_index = aforeach_current_index();
                     }
-                    ndx++;
                 }
                 if (r_index < 0)
                     ereport(ERROR,
@@ -1390,7 +1372,7 @@ transformFromClauseItem(ParseState *pstate, Node *n,
                 r_colvar = list_nth(r_colvars, r_index);
                 r_usingvars = lappend(r_usingvars, r_colvar);
-                res_colnames = lappend(res_colnames, lfirst(ucol));
+                res_colnames = lappend(res_colnames, aforeach_current());
                 res_colvars = lappend(res_colvars,
                                       buildMergedJoinVar(pstate,
                                                          j->jointype,
@@ -1643,11 +1625,9 @@ makeNamespaceItem(RangeTblEntry *rte, bool rel_visible, bool cols_visible,
 static void
 setNamespaceColumnVisibility(List *namespace, bool cols_visible)
 {
-    ListCell   *lc;
-
-    foreach(lc, namespace)
+    aforeach(namespace)
     {
-        ParseNamespaceItem *nsitem = (ParseNamespaceItem *) lfirst(lc);
+        ParseNamespaceItem *nsitem = (ParseNamespaceItem *) aforeach_current();
         nsitem->p_cols_visible = cols_visible;
     }
@@ -1660,11 +1640,9 @@ setNamespaceColumnVisibility(List *namespace, bool cols_visible)
 static void
 setNamespaceLateralState(List *namespace, bool lateral_only, bool lateral_ok)
 {
-    ListCell   *lc;
-
-    foreach(lc, namespace)
+    aforeach(namespace)
     {
-        ParseNamespaceItem *nsitem = (ParseNamespaceItem *) lfirst(lc);
+        ParseNamespaceItem *nsitem = (ParseNamespaceItem *) aforeach_current();
         nsitem->p_lateral_only = lateral_only;
         nsitem->p_lateral_ok = lateral_ok;
@@ -1822,8 +1800,6 @@ static TargetEntry *
 findTargetlistEntrySQL92(ParseState *pstate, Node *node, List **tlist,
                          ParseExprKind exprKind)
 {
-    ListCell   *tl;
-
     /*----------
      * Handle two special cases as mandated by the SQL92 spec:
      *
@@ -1895,9 +1871,9 @@ findTargetlistEntrySQL92(ParseState *pstate, Node *node, List **tlist,
         {
             TargetEntry *target_result = NULL;
-            foreach(tl, *tlist)
+            aforeach(*tlist)
             {
-                TargetEntry *tle = (TargetEntry *) lfirst(tl);
+                TargetEntry *tle = aforeach_current_node(TargetEntry);
                 if (!tle->resjunk &&
                     strcmp(tle->resname, name) == 0)
@@ -1944,9 +1920,9 @@ findTargetlistEntrySQL92(ParseState *pstate, Node *node, List **tlist,
                      parser_errposition(pstate, location)));
         target_pos = intVal(val);
-        foreach(tl, *tlist)
+        aforeach(*tlist)
         {
-            TargetEntry *tle = (TargetEntry *) lfirst(tl);
+            TargetEntry *tle = aforeach_current_node(TargetEntry);
             if (!tle->resjunk)
             {
@@ -1990,7 +1966,6 @@ findTargetlistEntrySQL99(ParseState *pstate, Node *node, List **tlist,
                          ParseExprKind exprKind)
 {
     TargetEntry *target_result;
-    ListCell   *tl;
     Node       *expr;
     /*
@@ -2002,9 +1977,9 @@ findTargetlistEntrySQL99(ParseState *pstate, Node *node, List **tlist,
      */
     expr = transformExpr(pstate, node, exprKind);
-    foreach(tl, *tlist)
+    aforeach(*tlist)
     {
-        TargetEntry *tle = (TargetEntry *) lfirst(tl);
+        TargetEntry *tle = aforeach_current_node(TargetEntry);
         Node       *texpr;
         /*
@@ -2094,7 +2069,6 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
         case T_GroupingSet:
             {
                 GroupingSet *gset = (GroupingSet *) expr;
-                ListCell   *l2;
                 List       *result_set = NIL;
                 if (hasGroupingSets)
@@ -2109,9 +2083,9 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
                 if (toplevel && gset->kind == GROUPING_SET_EMPTY)
                     return (Node *) NIL;
-                foreach(l2, gset->content)
+                aforeach(gset->content)
                 {
-                    Node       *n1 = lfirst(l2);
+                    Node       *n1 = (Node *) aforeach_current();
                     Node       *n2 = flatten_grouping_sets(n1, false, NULL);
                     if (IsA(n1, GroupingSet) &&
@@ -2139,13 +2113,13 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
         case T_List:
             {
                 List       *result = NIL;
-                ListCell   *l;
-                foreach(l, (List *) expr)
+                aforeach((List *) expr)
                 {
-                    Node       *n = flatten_grouping_sets(lfirst(l), toplevel, hasGroupingSets);
+                    Node       *n = (Node *) aforeach_current();
-                    if (n != (Node *) NIL)
+                    n = flatten_grouping_sets(n, toplevel, hasGroupingSets);
+                    if (n)
                     {
                         if (IsA(n, List))
                             result = list_concat(result, (List *) n);
@@ -2200,8 +2174,6 @@ transformGroupClauseExpr(List **flatresult, Bitmapset *seen_local,
     if (tle->ressortgroupref > 0)
     {
-        ListCell   *sl;
-
         /*
          * Eliminate duplicates (GROUP BY x, x) but only at local level.
          * (Duplicates in grouping sets can affect the number of returned
@@ -2239,10 +2211,9 @@ transformGroupClauseExpr(List **flatresult, Bitmapset *seen_local,
          * another sort step is going to be inevitable, but that's the
          * planner's problem.
          */
-
-        foreach(sl, sortClause)
+        aforeach(sortClause)
         {
-            SortGroupClause *sc = (SortGroupClause *) lfirst(sl);
+            SortGroupClause *sc = aforeach_current_node(SortGroupClause);
             if (sc->tleSortGroupRef == tle->ressortgroupref)
             {
@@ -2298,11 +2269,10 @@ transformGroupClauseList(List **flatresult,
 {
     Bitmapset  *seen_local = NULL;
     List       *result = NIL;
-    ListCell   *gl;
-    foreach(gl, list)
+    aforeach(list)
     {
-        Node       *gexpr = (Node *) lfirst(gl);
+        Node       *gexpr = (Node *) aforeach_current();
         Index        ref = transformGroupClauseExpr(flatresult,
                                                    seen_local,
@@ -2349,14 +2319,13 @@ transformGroupingSet(List **flatresult,
                      List **targetlist, List *sortClause,
                      ParseExprKind exprKind, bool useSQL99, bool toplevel)
 {
-    ListCell   *gl;
     List       *content = NIL;
     Assert(toplevel || gset->kind != GROUPING_SET_SETS);
-    foreach(gl, gset->content)
+    aforeach(gset->content)
     {
-        Node       *n = lfirst(gl);
+        Node       *n = (Node *) aforeach_current();
         if (IsA(n, List))
         {
@@ -2371,7 +2340,7 @@ transformGroupingSet(List **flatresult,
         }
         else if (IsA(n, GroupingSet))
         {
-            GroupingSet *gset2 = (GroupingSet *) lfirst(gl);
+            GroupingSet *gset2 = (GroupingSet *) n;
             content = lappend(content, transformGroupingSet(flatresult,
                                                             pstate, gset2,
@@ -2455,7 +2424,6 @@ transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets,
     List       *result = NIL;
     List       *flat_grouplist;
     List       *gsets = NIL;
-    ListCell   *gl;
     bool        hasGroupingSets = false;
     Bitmapset  *seen_local = NULL;
@@ -2481,9 +2449,9 @@ transformGroupClause(ParseState *pstate, List *grouplist, List **groupingSets,
                                                     exprLocation((Node *) grouplist)));
     }
-    foreach(gl, flat_grouplist)
+    aforeach(flat_grouplist)
     {
-        Node       *gexpr = (Node *) lfirst(gl);
+        Node       *gexpr = (Node *) aforeach_current();
         if (IsA(gexpr, GroupingSet))
         {
@@ -2555,11 +2523,10 @@ transformSortClause(ParseState *pstate,
                     bool useSQL99)
 {
     List       *sortlist = NIL;
-    ListCell   *olitem;
-    foreach(olitem, orderlist)
+    aforeach(orderlist)
     {
-        SortBy       *sortby = (SortBy *) lfirst(olitem);
+        SortBy       *sortby = aforeach_current_node(SortBy);
         TargetEntry *tle;
         if (useSQL99)
@@ -2587,11 +2554,10 @@ transformWindowDefinitions(ParseState *pstate,
 {
     List       *result = NIL;
     Index        winref = 0;
-    ListCell   *lc;
-    foreach(lc, windowdefs)
+    aforeach(windowdefs)
     {
-        WindowDef  *windef = (WindowDef *) lfirst(lc);
+        WindowDef  *windef = aforeach_current_node(WindowDef);
         WindowClause *refwc = NULL;
         List       *partitionClause;
         List       *orderClause;
@@ -2805,8 +2771,6 @@ transformDistinctClause(ParseState *pstate,
                         List **targetlist, List *sortClause, bool is_agg)
 {
     List       *result = NIL;
-    ListCell   *slitem;
-    ListCell   *tlitem;
     /*
      * The distinctClause should consist of all ORDER BY items followed by all
@@ -2823,9 +2787,9 @@ transformDistinctClause(ParseState *pstate,
      * effect will be that the TLE value will be made unique according to both
      * sortops.
      */
-    foreach(slitem, sortClause)
+    aforeach(sortClause)
     {
-        SortGroupClause *scl = (SortGroupClause *) lfirst(slitem);
+        SortGroupClause *scl = aforeach_current_node(SortGroupClause);
         TargetEntry *tle = get_sortgroupclause_tle(scl, *targetlist);
         if (tle->resjunk)
@@ -2843,9 +2807,9 @@ transformDistinctClause(ParseState *pstate,
      * Now add any remaining non-resjunk tlist items, using default sort/group
      * semantics for their data types.
      */
-    foreach(tlitem, *targetlist)
+    aforeach(*targetlist)
     {
-        TargetEntry *tle = (TargetEntry *) lfirst(tlitem);
+        TargetEntry *tle = aforeach_current_node(TargetEntry);
         if (tle->resjunk)
             continue;            /* ignore junk */
@@ -2902,9 +2866,9 @@ transformDistinctOnClause(ParseState *pstate, List *distinctlist,
      * Also notice that we could have duplicate DISTINCT ON expressions and
      * hence duplicate entries in sortgrouprefs.)
      */
-    foreach(lc, distinctlist)
+    aforeach(distinctlist)
     {
-        Node       *dexpr = (Node *) lfirst(lc);
+        Node       *dexpr = (Node *) aforeach_current();
         int            sortgroupref;
         TargetEntry *tle;
@@ -2923,9 +2887,9 @@ transformDistinctOnClause(ParseState *pstate, List *distinctlist,
      * in DISTINCT ON.
      */
     skipped_sortitem = false;
-    foreach(lc, sortClause)
+    aforeach(sortClause)
     {
-        SortGroupClause *scl = (SortGroupClause *) lfirst(lc);
+        SortGroupClause *scl = aforeach_current_node(SortGroupClause);
         if (list_member_int(sortgrouprefs, scl->tleSortGroupRef))
         {
@@ -3021,11 +2985,10 @@ resolve_unique_index_expr(ParseState *pstate, InferClause *infer,
                           Relation heapRel)
 {
     List       *result = NIL;
-    ListCell   *l;
-    foreach(l, infer->indexElems)
+    aforeach(infer->indexElems)
     {
-        IndexElem  *ielem = (IndexElem *) lfirst(l);
+        IndexElem  *ielem = aforeach_current_node(IndexElem);
         InferenceElem *pInfer = makeNode(InferenceElem);
         Node       *parse;
@@ -3422,16 +3385,15 @@ Index
 assignSortGroupRef(TargetEntry *tle, List *tlist)
 {
     Index        maxRef;
-    ListCell   *l;
     if (tle->ressortgroupref)    /* already has one? */
         return tle->ressortgroupref;
     /* easiest way to pick an unused refnumber: max used + 1 */
     maxRef = 0;
-    foreach(l, tlist)
+    aforeach(tlist)
     {
-        Index        ref = ((TargetEntry *) lfirst(l))->ressortgroupref;
+        Index        ref = aforeach_current_node(TargetEntry)->ressortgroupref;
         if (ref > maxRef)
             maxRef = ref;
@@ -3463,15 +3425,14 @@ bool
 targetIsInSortList(TargetEntry *tle, Oid sortop, List *sortList)
 {
     Index        ref = tle->ressortgroupref;
-    ListCell   *l;
     /* no need to scan list if tle has no marker */
     if (ref == 0)
         return false;
-    foreach(l, sortList)
+    aforeach(sortList)
     {
-        SortGroupClause *scl = (SortGroupClause *) lfirst(l);
+        SortGroupClause *scl = aforeach_current_node(SortGroupClause);
         if (scl->tleSortGroupRef == ref &&
             (sortop == InvalidOid ||
@@ -3489,11 +3450,9 @@ targetIsInSortList(TargetEntry *tle, Oid sortop, List *sortList)
 static WindowClause *
 findWindowClause(List *wclist, const char *name)
 {
-    ListCell   *l;
-
-    foreach(l, wclist)
+    aforeach(wclist)
     {
-        WindowClause *wc = (WindowClause *) lfirst(l);
+        WindowClause *wc = aforeach_current_node(WindowClause);
         if (wc->name && strcmp(wc->name, name) == 0)
             return wc;
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 1463408..21d0a67 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -381,6 +381,59 @@ lnext(const List *l, const ListCell *c)
 #define foreach_current_index(cell)  (cell##__state.i)
 /*
+ * aforeach (anonymous foreach) -
+ *      another convenience macro for looping through a list
+ *
+ * This is the same as foreach() except that no ListCell variable is needed.
+ * Instead, reference the current list element with the appropriate variant
+ * of aforeach_current().
+ */
+#define aforeach(lst)    \
+    for (ForEachState aforeach__state = {(lst), 0}; \
+         aforeach__state.l != NIL && \
+         aforeach__state.i < aforeach__state.l->length; \
+         aforeach__state.i++)
+
+/*
+ * aforeach_current... -
+ *
+ * Access the current element of the most closely nested aforeach() loop.
+ * These exist in the same variants that lfirst...() has.
+ */
+#define aforeach_current()    \
+    (aforeach__state.l->elements[aforeach__state.i].ptr_value)
+#define aforeach_current_int()    \
+    (aforeach__state.l->elements[aforeach__state.i].int_value)
+#define aforeach_current_oid()    \
+    (aforeach__state.l->elements[aforeach__state.i].oid_value)
+#define aforeach_current_node(type)    \
+    castNode(type, aforeach_current())
+
+/*
+ * aforeach_delete_current -
+ *      delete the current list element from the List associated with the most
+ *      closely nested aforeach() loop, returning the new List pointer.
+ *
+ * This is equivalent to list_delete_nth_cell(), but it also adjusts the
+ * aforeach loop's state so that no list elements will be missed.  Do not
+ * delete elements from an active aforeach loop's list in any other way!
+ */
+#define aforeach_delete_current()    \
+    ((List *) (aforeach__state.l = list_delete_nth_cell(aforeach__state.l, \
+                                                        aforeach__state.i--)))
+
+/*
+ * aforeach_current_index -
+ *      get the zero-based list index of the most closely nested aforeach()
+ *      loop's current element.
+ *
+ * Beware of using this after aforeach_delete_current(); the value will be
+ * out of sync for the rest of the current loop iteration.  Anyway, since
+ * you just deleted the current element, the value is pretty meaningless.
+ */
+#define aforeach_current_index()  (aforeach__state.i)
+
+/*
  * for_each_cell -
  *      a convenience macro which loops through a list starting from a
  *      specified cell
			
		I wrote:
> BTW, further on the subject of performance --- I'm aware of at least
> these topics for follow-on patches:
> * Fix places that are maintaining arrays parallel to Lists for
> access-speed reasons (at least simple_rte_array, append_rel_array,
> es_range_table_array).
Attached is a patch that removes simple_rte_array in favor of just
accessing the query's rtable directly.  I concluded that there was
not much point in messing with simple_rel_array or append_rel_array,
because they are not in fact just mirrors of some List.  There's no
List at all of baserel RelOptInfos, and while we do have a list of
AppendRelInfos, it's a compact, random-order list not one indexable
by child relid.
Having done this, though, I'm a bit discouraged about whether to commit
it.  In light testing, it's not any faster than HEAD and in complex
queries seems to actually be a bit slower.  I suspect the reason is
that we've effectively replaced
    root->simple_rte_array[i]
with
    root->parse->rtable->elements[i-1]
and the two extra levels of indirection are taking their toll.
It'd be possible to get rid of one of those indirections by maintaining a
copy of root->parse->rtable directly in PlannerInfo; but that throws away
most of the intellectual appeal of not having two sources of truth to
maintain, and it won't completely close the performance gap.
Other minor objections include:
* Many RTE accesses now look randomly different from adjacent 
RelOptInfo accesses.
* The inheritance-expansion code is a bit sloppy about how much it will
expand these arrays, which means it's possible in corner cases for
length(parse->rtable) to be less than root->simple_rel_array_size-1.
This resulted in a crash in add_other_rels_to_query, which was assuming
it could fetch a possibly-null RTE pointer from indexes up to
simple_rel_array_size-1.  While that wasn't hard to fix, I wonder
whether any third-party code has similar assumptions.
So on the whole, I'm inclined not to do this.  There are some cosmetic
bits of this patch that I do want to commit though: I found some
out-of-date comments, and I realized that it's pretty dumb not to
just merge setup_append_rel_array into setup_simple_rel_arrays.
The division of labor there is inconsistent with the fact that
there's no such division in expand_planner_arrays.
I still have hopes for getting rid of es_range_table_array though,
and will look at that tomorrow or so.
            regards, tom lane
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a2058..8bd1c47 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2198,7 +2198,7 @@ postgresPlanDirectModify(PlannerInfo *root,
     }
     else
         foreignrel = root->simple_rel_array[resultRelation];
-    rte = root->simple_rte_array[resultRelation];
+    rte = planner_rt_fetch(resultRelation, root);
     fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
     /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e9ee32b..fe7f8b1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -307,7 +307,7 @@ set_base_rel_sizes(PlannerInfo *root)
         if (rel->reloptkind != RELOPT_BASEREL)
             continue;
-        rte = root->simple_rte_array[rti];
+        rte = planner_rt_fetch(rti, root);
         /*
          * If parallelism is allowable for this query in general, see whether
@@ -349,7 +349,7 @@ set_base_rel_pathlists(PlannerInfo *root)
         if (rel->reloptkind != RELOPT_BASEREL)
             continue;
-        set_rel_pathlist(root, rel, rti, root->simple_rte_array[rti]);
+        set_rel_pathlist(root, rel, rti, planner_rt_fetch(rti, root));
     }
 }
@@ -1008,7 +1008,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
             continue;
         childRTindex = appinfo->child_relid;
-        childRTE = root->simple_rte_array[childRTindex];
+        childRTE = planner_rt_fetch(childRTindex, root);
         /*
          * The child rel's RelOptInfo was already created during
@@ -1239,7 +1239,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
         /* Re-locate the child RTE and RelOptInfo */
         childRTindex = appinfo->child_relid;
-        childRTE = root->simple_rte_array[childRTindex];
+        childRTE = planner_rt_fetch(childRTindex, root);
         childrel = root->simple_rel_array[childRTindex];
         /*
@@ -3742,9 +3742,8 @@ print_relids(PlannerInfo *root, Relids relids)
     {
         if (!first)
             printf(" ");
-        if (x < root->simple_rel_array_size &&
-            root->simple_rte_array[x])
-            printf("%s", root->simple_rte_array[x]->eref->aliasname);
+        if (x <= list_length(root->parse->rtable))
+            printf("%s", planner_rt_fetch(x, root)->eref->aliasname);
         else
             printf("%d", x);
         first = false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff41..2576439 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -609,7 +609,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
     }
     else if (rel->rtekind == RTE_SUBQUERY)
     {
-        Query       *subquery = root->simple_rte_array[rel->relid]->subquery;
+        Query       *subquery = planner_rt_fetch(rel->relid, root)->subquery;
         /* Check if the subquery has any qualities that support distinctness */
         if (query_supports_distinctness(subquery))
@@ -660,7 +660,7 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
     else if (rel->rtekind == RTE_SUBQUERY)
     {
         Index        relid = rel->relid;
-        Query       *subquery = root->simple_rte_array[relid]->subquery;
+        Query       *subquery = planner_rt_fetch(relid, root)->subquery;
         List       *colnos = NIL;
         List       *opids = NIL;
         ListCell   *l;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f232569..5689330 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4472,7 +4472,7 @@ create_hashjoin_plan(PlannerInfo *root,
             Var           *var = (Var *) node;
             RangeTblEntry *rte;
-            rte = root->simple_rte_array[var->varno];
+            rte = planner_rt_fetch(var->varno, root);
             if (rte->rtekind == RTE_RELATION)
             {
                 skewTable = rte->relid;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 73da0c2..4a2aaa2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -148,7 +148,7 @@ add_other_rels_to_query(PlannerInfo *root)
     for (rti = 1; rti < root->simple_rel_array_size; rti++)
     {
         RelOptInfo *rel = root->simple_rel_array[rti];
-        RangeTblEntry *rte = root->simple_rte_array[rti];
+        RangeTblEntry *rte;
         /* there may be empty slots corresponding to non-baserel RTEs */
         if (rel == NULL)
@@ -159,6 +159,7 @@ add_other_rels_to_query(PlannerInfo *root)
             continue;
         /* If it's marked as inheritable, look for children. */
+        rte = planner_rt_fetch(rti, root);
         if (rte->inh)
             expand_inherited_rtentry(root, rel, rte, rti);
     }
@@ -351,7 +352,7 @@ find_lateral_references(PlannerInfo *root)
 static void
 extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex)
 {
-    RangeTblEntry *rte = root->simple_rte_array[rtindex];
+    RangeTblEntry *rte = planner_rt_fetch(rtindex, root);
     List       *vars;
     List       *newvars;
     Relids        where_needed;
@@ -1086,7 +1087,7 @@ process_security_barrier_quals(PlannerInfo *root,
                                int rti, Relids qualscope,
                                bool below_outer_join)
 {
-    RangeTblEntry *rte = root->simple_rte_array[rti];
+    RangeTblEntry *rte = planner_rt_fetch(rti, root);
     Index        security_level = 0;
     ListCell   *lc;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index df3f8c2..3398bde 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -79,9 +79,7 @@ query_planner(PlannerInfo *root,
     root->initial_rels = NIL;
     /*
-     * Make a flattened version of the rangetable for faster access (this is
-     * OK because the rangetable won't change any more), and set up an empty
-     * array for indexing base relations.
+     * Set up arrays for accessing base relations and AppendRelInfos.
      */
     setup_simple_rel_arrays(root);
@@ -99,7 +97,7 @@ query_planner(PlannerInfo *root,
         if (IsA(jtnode, RangeTblRef))
         {
             int            varno = ((RangeTblRef *) jtnode)->rtindex;
-            RangeTblEntry *rte = root->simple_rte_array[varno];
+            RangeTblEntry *rte = planner_rt_fetch(varno, root);
             Assert(rte != NULL);
             if (rte->rtekind == RTE_RESULT)
@@ -157,12 +155,6 @@ query_planner(PlannerInfo *root,
     }
     /*
-     * Populate append_rel_array with each AppendRelInfo to allow direct
-     * lookups by child relid.
-     */
-    setup_append_rel_array(root);
-
-    /*
      * Construct RelOptInfo nodes for all base relations used in the query.
      * Appendrel member relations ("other rels") will be added later.
      *
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f918dd..7a38955 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1754,17 +1754,6 @@ inheritance_planner(PlannerInfo *root)
         root->simple_rel_array = save_rel_array;
         root->append_rel_array = save_append_rel_array;
-        /* Must reconstruct master's simple_rte_array, too */
-        root->simple_rte_array = (RangeTblEntry **)
-            palloc0((list_length(final_rtable) + 1) * sizeof(RangeTblEntry *));
-        rti = 1;
-        foreach(lc, final_rtable)
-        {
-            RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-            root->simple_rte_array[rti++] = rte;
-        }
-
         /* Put back adjusted rowmarks, too */
         root->rowMarks = final_rowmarks;
     }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5a11c12..a05ed10 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -132,17 +132,11 @@ plan_set_operations(PlannerInfo *root)
     /*
      * We'll need to build RelOptInfos for each of the leaf subqueries, which
      * are RTE_SUBQUERY rangetable entries in this Query.  Prepare the index
-     * arrays for that.
+     * arrays for those, and for AppendRelInfos in case they're needed.
      */
     setup_simple_rel_arrays(root);
     /*
-     * Populate append_rel_array with each AppendRelInfo to allow direct
-     * lookups by child relid.
-     */
-    setup_append_rel_array(root);
-
-    /*
      * Find the leftmost component Query.  We need to use its column names for
      * all generated tlists (else SELECT INTO won't work right).
      */
@@ -150,7 +144,7 @@ plan_set_operations(PlannerInfo *root)
     while (node && IsA(node, SetOperationStmt))
         node = ((SetOperationStmt *) node)->larg;
     Assert(node && IsA(node, RangeTblRef));
-    leftmostRTE = root->simple_rte_array[((RangeTblRef *) node)->rtindex];
+    leftmostRTE = planner_rt_fetch(((RangeTblRef *) node)->rtindex, root);
     leftmostQuery = leftmostRTE->subquery;
     Assert(leftmostQuery != NULL);
@@ -225,7 +219,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
     if (IsA(setOp, RangeTblRef))
     {
         RangeTblRef *rtr = (RangeTblRef *) setOp;
-        RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex];
+        RangeTblEntry *rte = planner_rt_fetch(rtr->rtindex, root);
         Query       *subquery = rte->subquery;
         PlannerInfo *subroot;
         RelOptInfo *final_rel;
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 38bc61e..5ed7737 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -481,12 +481,10 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
     }
     /*
-     * Store the RTE and appinfo in the respective PlannerInfo arrays, which
-     * the caller must already have allocated space for.
+     * Store the appinfo in the associated PlannerInfo array, which the caller
+     * must already have allocated space for.
      */
     Assert(childRTindex < root->simple_rel_array_size);
-    Assert(root->simple_rte_array[childRTindex] == NULL);
-    root->simple_rte_array[childRTindex] = childrte;
     Assert(root->append_rel_array[childRTindex] == NULL);
     root->append_rel_array[childRTindex] = appinfo;
@@ -601,7 +599,7 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
         /* find the child RTE, which should already exist */
         Assert(childRTindex < root->simple_rel_array_size);
-        childrte = root->simple_rte_array[childRTindex];
+        childrte = planner_rt_fetch(childRTindex, root);
         Assert(childrte != NULL);
         /* Build the child RelOptInfo. */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 98e9948..848ca1a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -177,7 +177,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
          * index while we hold lock on the parent rel, and no lock type used
          * for queries blocks any other kind of index operation.
          */
-        lmode = root->simple_rte_array[varno]->rellockmode;
+        lmode = planner_rt_fetch(varno, root)->rellockmode;
         foreach(l, indexoidlist)
         {
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 37d228c..1f9f037 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -67,46 +67,26 @@ static void build_child_join_reltarget(PlannerInfo *root,
 /*
  * setup_simple_rel_arrays
- *      Prepare the arrays we use for quickly accessing base relations.
+ *      Prepare the arrays we use for quickly accessing base relations
+ *      and AppendRelInfos.
  */
 void
 setup_simple_rel_arrays(PlannerInfo *root)
 {
-    Index        rti;
+    /* Arrays are accessed using RT indexes (1..N) */
+    int            size = list_length(root->parse->rtable) + 1;
     ListCell   *lc;
-    /* Arrays are accessed using RT indexes (1..N) */
-    root->simple_rel_array_size = list_length(root->parse->rtable) + 1;
+    root->simple_rel_array_size = size;
-    /* simple_rel_array is initialized to all NULLs */
+    /*
+     * simple_rel_array is initialized to all NULLs, since no RelOptInfos
+     * exist yet.  It'll be filled by later calls to build_simple_rel().
+     */
     root->simple_rel_array = (RelOptInfo **)
-        palloc0(root->simple_rel_array_size * sizeof(RelOptInfo *));
-
-    /* simple_rte_array is an array equivalent of the rtable list */
-    root->simple_rte_array = (RangeTblEntry **)
-        palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
-    rti = 1;
-    foreach(lc, root->parse->rtable)
-    {
-        RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
-
-        root->simple_rte_array[rti++] = rte;
-    }
-}
-
-/*
- * setup_append_rel_array
- *        Populate the append_rel_array to allow direct lookups of
- *        AppendRelInfos by child relid.
- *
- * The array remains unallocated if there are no AppendRelInfos.
- */
-void
-setup_append_rel_array(PlannerInfo *root)
-{
-    ListCell   *lc;
-    int            size = list_length(root->parse->rtable) + 1;
+        palloc0(size * sizeof(RelOptInfo *));
+    /* append_rel_array is not needed if there are no AppendRelInfos */
     if (root->append_rel_list == NIL)
     {
         root->append_rel_array = NULL;
@@ -116,6 +96,12 @@ setup_append_rel_array(PlannerInfo *root)
     root->append_rel_array = (AppendRelInfo **)
         palloc0(size * sizeof(AppendRelInfo *));
+    /*
+     * append_rel_array is filled with any already-existing AppendRelInfos,
+     * which currently could only come from UNION ALL flattening.  We might
+     * add more later during inheritance expansion, but it's the
+     * responsibility of the expansion code to update the array properly.
+     */
     foreach(lc, root->append_rel_list)
     {
         AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
@@ -133,8 +119,12 @@ setup_append_rel_array(PlannerInfo *root)
 /*
  * expand_planner_arrays
- *        Expand the PlannerInfo's per-RTE arrays by add_size members
+ *        Expand the PlannerInfo's per-baserel arrays by add_size members
  *        and initialize the newly added entries to NULLs
+ *
+ * Note: this causes the append_rel_array to become allocated even if
+ * it was not before.  This is okay for current uses, because we only call
+ * this when adding child relations, which always have AppendRelInfos.
  */
 void
 expand_planner_arrays(PlannerInfo *root, int add_size)
@@ -145,12 +135,6 @@ expand_planner_arrays(PlannerInfo *root, int add_size)
     new_size = root->simple_rel_array_size + add_size;
-    root->simple_rte_array = (RangeTblEntry **)
-        repalloc(root->simple_rte_array,
-                 sizeof(RangeTblEntry *) * new_size);
-    MemSet(root->simple_rte_array + root->simple_rel_array_size,
-           0, sizeof(RangeTblEntry *) * add_size);
-
     root->simple_rel_array = (RelOptInfo **)
         repalloc(root->simple_rel_array,
                  sizeof(RelOptInfo *) * new_size);
@@ -190,7 +174,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
         elog(ERROR, "rel %d already exists", relid);
     /* Fetch RTE for relation */
-    rte = root->simple_rte_array[relid];
+    rte = planner_rt_fetch(relid, root);
     Assert(rte != NULL);
     rel = makeNode(RelOptInfo);
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 23c74f7..05c18a4 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -793,7 +793,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
     /* (Var op Const) or (Const op Var) */
     if (is_opclause(clause))
     {
-        RangeTblEntry *rte = root->simple_rte_array[relid];
+        RangeTblEntry *rte = planner_rt_fetch(relid, root);
         OpExpr       *expr = (OpExpr *) clause;
         Var           *var;
@@ -922,7 +922,7 @@ static bool
 statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
                              Bitmapset **attnums)
 {
-    RangeTblEntry *rte = root->simple_rte_array[relid];
+    RangeTblEntry *rte = planner_rt_fetch(relid, root);
     RestrictInfo *rinfo = (RestrictInfo *) clause;
     Oid            userid;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7eba59e..1c18499 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4648,7 +4648,7 @@ static void
 examine_simple_variable(PlannerInfo *root, Var *var,
                         VariableStatData *vardata)
 {
-    RangeTblEntry *rte = root->simple_rte_array[var->varno];
+    RangeTblEntry *rte = planner_rt_fetch(var->varno, root);
     Assert(IsA(rte, RangeTblEntry));
@@ -5130,7 +5130,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
     if (rel == NULL || rel->indexlist == NIL)
         return false;
     /* If it has indexes it must be a plain relation */
-    rte = root->simple_rte_array[rel->relid];
+    rte = planner_rt_fetch(rel->relid, root);
     Assert(rte->rtekind == RTE_RELATION);
     /* Search through the indexes to see if any match our problem */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e3c579e..6f3115a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -203,15 +203,7 @@ struct PlannerInfo
     int            simple_rel_array_size;    /* allocated size of array */
     /*
-     * simple_rte_array is the same length as simple_rel_array and holds
-     * pointers to the associated rangetable entries.  This lets us avoid
-     * rt_fetch(), which can be a bit slow once large inheritance sets have
-     * been expanded.
-     */
-    RangeTblEntry **simple_rte_array;    /* rangetable as an array */
-
-    /*
-     * append_rel_array is the same length as the above arrays, and holds
+     * append_rel_array is the same length as simple_rel_array, and holds
      * pointers to the corresponding AppendRelInfo entry indexed by
      * child_relid, or NULL if none.  The array itself is not allocated if
      * append_rel_list is empty.
@@ -365,14 +357,9 @@ struct PlannerInfo
 };
-/*
- * In places where it's known that simple_rte_array[] must have been prepared
- * already, we just index into it to fetch RTEs.  In code that might be
- * executed before or after entering query_planner(), use this macro.
- */
+/* Handy macro for getting the RTE with rangetable index rti */
 #define planner_rt_fetch(rti, root) \
-    ((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
-     rt_fetch(rti, (root)->parse->rtable))
+    ((RangeTblEntry *) list_nth((root)->parse->rtable, (rti)-1))
 /*
  * If multiple relations are partitioned the same way, all such partitions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 182ffee..a12af54 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -277,7 +277,6 @@ extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
  * prototypes for relnode.c
  */
 extern void setup_simple_rel_arrays(PlannerInfo *root);
-extern void setup_append_rel_array(PlannerInfo *root);
 extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
                                     RelOptInfo *parent);
			
		On Fri, 9 Aug 2019 at 04:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Perhaps there's an argument for doing something to change the behavior > > of list_union and list_difference and friends. Not sure --- it could > > be a foot-gun for back-patching. I'm already worried about the risk > > of back-patching code that assumes the new semantics of list_concat. > > (Which might be a good argument for renaming it to something else? > > Just not list_union, please.) > > Has anyone got further thoughts about naming around list_concat > and friends? > > If not, I'm inclined to go ahead with the concat-improvement patch as > proposed in [1], modulo the one improvement David spotted. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/6704.1563739305@sss.pgh.pa.us I'm okay with the patch once that one improvement is done. I think if we want to think about freeing the 2nd input List then we can do that in another commit. Removing the redundant list_copy() calls seems quite separate from that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, 9 Aug 2019 at 09:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is a patch that removes simple_rte_array in favor of just > accessing the query's rtable directly. I concluded that there was > not much point in messing with simple_rel_array or append_rel_array, > because they are not in fact just mirrors of some List. There's no > List at all of baserel RelOptInfos, and while we do have a list of > AppendRelInfos, it's a compact, random-order list not one indexable > by child relid. > > Having done this, though, I'm a bit discouraged about whether to commit > it. In light testing, it's not any faster than HEAD and in complex > queries seems to actually be a bit slower. I suspect the reason is > that we've effectively replaced > root->simple_rte_array[i] > with > root->parse->rtable->elements[i-1] > and the two extra levels of indirection are taking their toll. If there are no performance gains from this then -1 from me. We're all pretty used to it the way it is > I realized that it's pretty dumb not to > just merge setup_append_rel_array into setup_simple_rel_arrays. > The division of labor there is inconsistent with the fact that > there's no such division in expand_planner_arrays. ha, yeah I'd vote for merging those. It was coded that way originally until someone objected! :) > I still have hopes for getting rid of es_range_table_array though, > and will look at that tomorrow or so. Yes, please. I've measured that to be quite an overhead with large partitioning setups. However, that was with some additional code which didn't lock partitions until it was ... well .... too late... as it turned out. But it seems pretty good to remove code that could be a future bottleneck if we ever manage to do something else with the locking of all partitions during UPDATE/DELETE. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Fri, 9 Aug 2019 at 09:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I still have hopes for getting rid of es_range_table_array though,
>> and will look at that tomorrow or so.
> Yes, please. I've measured that to be quite an overhead with large
> partitioning setups. However, that was with some additional code which
> didn't lock partitions until it was ... well .... too late... as it
> turned out. But it seems pretty good to remove code that could be a
> future bottleneck if we ever manage to do something else with the
> locking of all partitions during UPDATE/DELETE.
I poked at this, and attached is a patch, but again I'm not seeing
that there's any real performance-based argument for it.  So far
as I can tell, if we've got a lot of RTEs in an executable plan,
the bulk of the startup time is going into lock (re) acquisition in
AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms;
both of those have to do work for every RTE including ones that
run-time pruning drops later on.  ExecInitRangeTable just isn't on
the radar.
If we wanted to try to improve things further, it seems like we'd
have to find a way to not lock unreferenced partitions at all,
as you suggest above.  But combining that with run-time pruning seems
like it'd be pretty horrid from a system structural standpoint: if we
acquire locks only during execution, what happens if we find we must
invalidate the query plan?
Anyway, the attached might be worth committing just on cleanliness
grounds, to avoid two-sources-of-truth issues in the executor.
But it seems like there's no additional performance win here
after all ... unless you've got a test case that shows differently?
            regards, tom lane
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9..7f494ab 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2790,7 +2790,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
     estate->es_snapshot = parentestate->es_snapshot;
     estate->es_crosscheck_snapshot = parentestate->es_crosscheck_snapshot;
     estate->es_range_table = parentestate->es_range_table;
-    estate->es_range_table_array = parentestate->es_range_table_array;
     estate->es_range_table_size = parentestate->es_range_table_size;
     estate->es_relations = parentestate->es_relations;
     estate->es_queryEnv = parentestate->es_queryEnv;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d5..afd9beb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -113,7 +113,6 @@ CreateExecutorState(void)
     estate->es_snapshot = InvalidSnapshot;    /* caller must initialize this */
     estate->es_crosscheck_snapshot = InvalidSnapshot;    /* no crosscheck */
     estate->es_range_table = NIL;
-    estate->es_range_table_array = NULL;
     estate->es_range_table_size = 0;
     estate->es_relations = NULL;
     estate->es_rowmarks = NULL;
@@ -720,29 +719,17 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
  * ExecInitRangeTable
  *        Set up executor's range-table-related data
  *
- * We build an array from the range table list to allow faster lookup by RTI.
- * (The es_range_table field is now somewhat redundant, but we keep it to
- * avoid breaking external code unnecessarily.)
- * This is also a convenient place to set up the parallel es_relations array.
+ * In addition to the range table proper, initialize arrays that are
+ * indexed by rangetable index.
  */
 void
 ExecInitRangeTable(EState *estate, List *rangeTable)
 {
-    Index        rti;
-    ListCell   *lc;
-
     /* Remember the range table List as-is */
     estate->es_range_table = rangeTable;
-    /* Set up the equivalent array representation */
+    /* Set size of associated arrays */
     estate->es_range_table_size = list_length(rangeTable);
-    estate->es_range_table_array = (RangeTblEntry **)
-        palloc(estate->es_range_table_size * sizeof(RangeTblEntry *));
-    rti = 0;
-    foreach(lc, rangeTable)
-    {
-        estate->es_range_table_array[rti++] = lfirst_node(RangeTblEntry, lc);
-    }
     /*
      * Allocate an array to store an open Relation corresponding to each
@@ -753,8 +740,8 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
         palloc0(estate->es_range_table_size * sizeof(Relation));
     /*
-     * es_rowmarks is also parallel to the es_range_table_array, but it's
-     * allocated only if needed.
+     * es_rowmarks is also parallel to the es_range_table, but it's allocated
+     * only if needed.
      */
     estate->es_rowmarks = NULL;
 }
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4..39c8b3b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -535,8 +535,7 @@ extern void ExecInitRangeTable(EState *estate, List *rangeTable);
 static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
 {
-    Assert(rti > 0 && rti <= estate->es_range_table_size);
-    return estate->es_range_table_array[rti - 1];
+    return (RangeTblEntry *) list_nth(estate->es_range_table, rti - 1);
 }
 extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4ec7849..063b490 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -502,7 +502,6 @@ typedef struct EState
     Snapshot    es_snapshot;    /* time qual to use */
     Snapshot    es_crosscheck_snapshot; /* crosscheck time qual for RI */
     List       *es_range_table; /* List of RangeTblEntry */
-    struct RangeTblEntry **es_range_table_array;    /* equivalent array */
     Index        es_range_table_size;    /* size of the range table arrays */
     Relation   *es_relations;    /* Array of per-range-table-entry Relation
                                  * pointers, or NULL if not yet opened */
			
		On 2019-Aug-09, Tom Lane wrote: > I poked at this, and attached is a patch, but again I'm not seeing > that there's any real performance-based argument for it. So far > as I can tell, if we've got a lot of RTEs in an executable plan, > the bulk of the startup time is going into lock (re) acquisition in > AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms; > both of those have to do work for every RTE including ones that > run-time pruning drops later on. ExecInitRangeTable just isn't on > the radar. I'm confused. I thought that the point of doing this wasn't that we wanted to improve performance, but rather that we're now able to remove the array without *losing* performance. I mean, those arrays were there to improve performance for code that wanted fast access to specific list items, but now we have fast access to the list items without it. So a measurement that finds no performance difference is good news, and we can get rid of the now-pointless optimization ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 10 Aug 2019 at 09:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > On Fri, 9 Aug 2019 at 09:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I still have hopes for getting rid of es_range_table_array though, > >> and will look at that tomorrow or so. > > > Yes, please. I've measured that to be quite an overhead with large > > partitioning setups. However, that was with some additional code which > > didn't lock partitions until it was ... well .... too late... as it > > turned out. But it seems pretty good to remove code that could be a > > future bottleneck if we ever manage to do something else with the > > locking of all partitions during UPDATE/DELETE. > > I poked at this, and attached is a patch, but again I'm not seeing > that there's any real performance-based argument for it. So far > as I can tell, if we've got a lot of RTEs in an executable plan, > the bulk of the startup time is going into lock (re) acquisition in > AcquirePlannerLocks, and/or permissions scanning in ExecCheckRTPerms; > both of those have to do work for every RTE including ones that > run-time pruning drops later on. ExecInitRangeTable just isn't on > the radar. In the code I tested with locally I ended up with a Bitmapset that marked which RTEs required permission checks so that ExecCheckRTPerms() could quickly skip RTEs with requiredPerms == 0. The Bitmapset was set in the planner. Note: expand_single_inheritance_child sets childrte->requiredPerms = 0, so there's nothing to do there for partitions, which is the most likely reason that the rtable list would be big. Sadly the locking is still a big overhead even with that fixed. Robert threw around some ideas in [1], but that seems like a pretty big project. I don't think removing future bottlenecks is such a bad idea if it can be done in such a way that the code remains clean. It may serve to increase our motivation later to solve the remaining issues. We tend to go to greater lengths when there are more gains, and more gains are more easily visible by removing more bottlenecks. Another reason to remove the es_range_table_array is that the reason it was added in the first place is no longer valid. We'd never have added it if we had array-based lists back then. (Reading below, it looks like Alvaro agrees with this too) [1] https://www.postgresql.org/message-id/CA%2BTgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Fri, 9 Aug 2019 at 04:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Has anyone got further thoughts about naming around list_concat >> and friends? >> If not, I'm inclined to go ahead with the concat-improvement patch as >> proposed in [1], modulo the one improvement David spotted. >> [1] https://www.postgresql.org/message-id/6704.1563739305@sss.pgh.pa.us > I'm okay with the patch once that one improvement is done. Pushed with that fix. > I think if we want to think about freeing the 2nd input List then we > can do that in another commit. Removing the redundant list_copy() > calls seems quite separate from that. The reason I was holding off is that this patch obscures the distinction between places that needed to preserve the second input (which were doing list_copy on it) and those that didn't (and weren't). If somebody wants to rethink the free-second-input business they'll now have to do a bit of software archaeology to determine which calls to change. But I don't think we're going to bother. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Aug-09, Tom Lane wrote:
>> I poked at this, and attached is a patch, but again I'm not seeing
>> that there's any real performance-based argument for it.
> I'm confused.  I thought that the point of doing this wasn't that we
> wanted to improve performance, but rather that we're now able to remove
> the array without *losing* performance.  I mean, those arrays were there
> to improve performance for code that wanted fast access to specific list
> items, but now we have fast access to the list items without it.  So a
> measurement that finds no performance difference is good news, and we
> can get rid of the now-pointless optimization ...
Yeah, fair enough, so pushed.
In principle, this change adds an extra indirection in exec_rt_fetch,
so I went looking to see if there were any such calls in arguably
performance-critical paths.  Unsurprisingly, most calls are in executor
initialization, and they tend to be adjacent to table_open() or other
expensive operations, so it's pretty hard to claim that there could
be any measurable hit.  However, I did notice that trigger.c uses
ExecUpdateLockMode() and GetAllUpdatedColumns() in ExecBRUpdateTriggers
which executes per-row, and so might be worth trying to optimize.
exec_rt_fetch itself is not the main cost in either of those, but I wonder
why we are doing those calculations over again for each row in the first
place.  I'm not excited enough about the issue to do anything right now,
but the next time somebody whines about trigger-firing overhead, there
might be an easy win available there.
            regards, tom lane