Обсуждение: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

Поиск
Список
Период
Сортировка

[HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Noah Misch
Дата:
While completing my annual src/backend/nodes/*funcs.c audit, I noticed defects
in commit 18ce3a4 changes to RangeTblEntry:

1. Field relid is under a comment saying it is valid for RTE_RELATION only.  Fields coltypes, coltypmods and
colcollationsare under a comment saying  they are valid for RTE_VALUES and RTE_CTE only.  But _outRangeTblEntry()
treatsall of the above as valid for RTE_NAMEDTUPLESTORE.  Which is right?
 

2. New fields enrname and enrtuples are set only for RTE_NAMEDTUPLESTORE, yet  they're under the comment for RTE_VALUES
andRTE_CTE.  This pair needs its  own comment.
 

3. Each of _{copy,equal,out,read}RangeTblEntry() silently ignores enrtuples.  _equalRangeTblEntry() ignores enrname,
too. In each case, the function  should either use the field or have a comment to note that skipping the  field is
intentional. Which should it be?
 

This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
the right place for enrtuples.  It's a concept regularly seen in planner data
structures but not otherwise seen at parse tree level.

nm



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Thomas Munro
Дата:
On Sun, Jun 11, 2017 at 6:25 PM, Noah Misch <noah@leadboat.com> wrote:
> While completing my annual src/backend/nodes/*funcs.c audit, I noticed defects
> in commit 18ce3a4 changes to RangeTblEntry:
>
> 1. Field relid is under a comment saying it is valid for RTE_RELATION only.

The comment is out of date.  Here's a fix for that.

>    Fields coltypes, coltypmods and colcollations are under a comment saying
>    they are valid for RTE_VALUES and RTE_CTE only.  But _outRangeTblEntry()
>    treats all of the above as valid for RTE_NAMEDTUPLESTORE.  Which is right?

The comment is wrong.  In passing I also noticed that RTE_TABLEFUNC
also uses coltypes et al and is not mentioned in that comment.  Here's
a fix for both omissions.

> 2. New fields enrname and enrtuples are set only for RTE_NAMEDTUPLESTORE, yet
>    they're under the comment for RTE_VALUES and RTE_CTE.  This pair needs its
>    own comment.

Right.  The attached patch fixes that too.

> 3. Each of _{copy,equal,out,read}RangeTblEntry() silently ignores enrtuples.
>    _equalRangeTblEntry() ignores enrname, too.  In each case, the function
>    should either use the field or have a comment to note that skipping the
>    field is intentional.  Which should it be?

Ignoring enrname in _equalRangeTblEntry is certainly a bug, and the
attached adds it.  I also noticed that _copyRangeTleEntry had enrname
but not in the same order as the struct's members, which seems to be
an accidental deviation from the convention, so I moved it in the
attached.

Ignoring enrtuples is probably also wrong, but ...

> This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
> the right place for enrtuples.  It's a concept regularly seen in planner data
> structures but not otherwise seen at parse tree level.

I agree that this is strange.  Perhaps
set_namedtuplestore_size_estimates should instead look up the
EphemeralNamedRelation by rte->enrname to find its way to
enr->md.enrtuples, but I'm not sure off the top of my head how it
should get its hands on the QueryEnvironment required to do that.  I
will look into this on Monday, but other ideas/clues welcome...

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Thomas Munro
Дата:
On Sun, Jun 11, 2017 at 11:11 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sun, Jun 11, 2017 at 6:25 PM, Noah Misch <noah@leadboat.com> wrote:
>> This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
>> the right place for enrtuples.  It's a concept regularly seen in planner data
>> structures but not otherwise seen at parse tree level.
>
> I agree that this is strange.  Perhaps
> set_namedtuplestore_size_estimates should instead look up the
> EphemeralNamedRelation by rte->enrname to find its way to
> enr->md.enrtuples, but I'm not sure off the top of my head how it
> should get its hands on the QueryEnvironment required to do that.  I
> will look into this on Monday, but other ideas/clues welcome...

Here's some background:  If you look at the interface changes
introduced by 18ce3a4 you will see that it is now possible for a
QueryEnvironment object to be injected into the parser and executor.
Currently the only use for it is to inject named tuplestores into the
system via SPI_register_relation or SPI_register_trigger_data.  That's
to support SQL standard transition tables, but anyone can now use SPI
to expose data to SQL via an ephemeral named relation in this way.
(In future we could imagine other kinds of objects like table
variables, anonymous functions or streams).

The QueryEnvironment is used by the parser to resolve names and build
RangeTblEntry objects.  The planner doesn't currently need it, because
all the information it needs is in the RangeTblEntry, including the
offending row estimate.  Then the executor needs it to get its hands
on the tuplestores.  So the question is: how can we get it into
costsize.c, so that it can look up the EphermalNamedRelationMetaData
object by name, instead of trafficking statistics through parser data
structures?

Here are a couple of ways forward that I can see:

1.  Figure out how to get the QueryEnvironment through more of these
stack frames (possibly inside other objects), so that
set_namedtuplestore_size_estimates can look up enrtuples by enrname:

  set_namedtuplestore_size_estimates <-- would need QueryEnvironment
  set_namedtuplestore_pathlist
  set_rel_size
  set_base_rel_sizes
  make_one_rel
  query_planner
  grouping_planner
  subquery_planner
  standard_planner
  planner
  pg_plan_query
  pg_plan_queries <-- doesn't receive QueryEnvironment
  BuildCachedPlan <-- receives QueryEnvironment
  GetCachedPlan
  _SPI_execute_plan
  SPI_execute_plan_with_paramlist

2.  Rip the row estimation out for now, use a bogus hard coded
estimate like we do in some other cases, and revisit later.  See
attached (including changes from my previous message).
Unsurprisingly, a query plan changes.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Noah Misch
Дата:
On Mon, Jun 12, 2017 at 04:04:23PM +1200, Thomas Munro wrote:
> On Sun, Jun 11, 2017 at 11:11 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > On Sun, Jun 11, 2017 at 6:25 PM, Noah Misch <noah@leadboat.com> wrote:
> >> This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
> >> the right place for enrtuples.  It's a concept regularly seen in planner data
> >> structures but not otherwise seen at parse tree level.
> >
> > I agree that this is strange.  Perhaps
> > set_namedtuplestore_size_estimates should instead look up the
> > EphemeralNamedRelation by rte->enrname to find its way to
> > enr->md.enrtuples, but I'm not sure off the top of my head how it
> > should get its hands on the QueryEnvironment required to do that.  I
> > will look into this on Monday, but other ideas/clues welcome...
> 
> Here's some background:  If you look at the interface changes
> introduced by 18ce3a4 you will see that it is now possible for a
> QueryEnvironment object to be injected into the parser and executor.
> Currently the only use for it is to inject named tuplestores into the
> system via SPI_register_relation or SPI_register_trigger_data.  That's
> to support SQL standard transition tables, but anyone can now use SPI
> to expose data to SQL via an ephemeral named relation in this way.
> (In future we could imagine other kinds of objects like table
> variables, anonymous functions or streams).
> 
> The QueryEnvironment is used by the parser to resolve names and build
> RangeTblEntry objects.  The planner doesn't currently need it, because
> all the information it needs is in the RangeTblEntry, including the
> offending row estimate.  Then the executor needs it to get its hands
> on the tuplestores.  So the question is: how can we get it into
> costsize.c, so that it can look up the EphermalNamedRelationMetaData
> object by name, instead of trafficking statistics through parser data
> structures?
> 
> Here are a couple of ways forward that I can see:
> 
> 1.  Figure out how to get the QueryEnvironment through more of these
> stack frames (possibly inside other objects), so that
> set_namedtuplestore_size_estimates can look up enrtuples by enrname:
> 
>   set_namedtuplestore_size_estimates <-- would need QueryEnvironment
>   set_namedtuplestore_pathlist
>   set_rel_size
>   set_base_rel_sizes
>   make_one_rel
>   query_planner
>   grouping_planner
>   subquery_planner
>   standard_planner
>   planner
>   pg_plan_query
>   pg_plan_queries <-- doesn't receive QueryEnvironment
>   BuildCachedPlan <-- receives QueryEnvironment
>   GetCachedPlan
>   _SPI_execute_plan
>   SPI_execute_plan_with_paramlist
> 
> 2.  Rip the row estimation out for now, use a bogus hard coded
> estimate like we do in some other cases, and revisit later.  See
> attached (including changes from my previous message).
> Unsurprisingly, a query plan changes.
> 
> Thoughts?

I'm not sufficiently familiar with the relevant code to judge this one.  Let's
see if planner experts voice an opinion.  Absent more opinions, the current
design stands.



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Thomas Munro
Дата:
On Tue, Jun 13, 2017 at 6:40 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jun 12, 2017 at 04:04:23PM +1200, Thomas Munro wrote:
>> On Sun, Jun 11, 2017 at 11:11 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>> > On Sun, Jun 11, 2017 at 6:25 PM, Noah Misch <noah@leadboat.com> wrote:
>> >> This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
>> >> the right place for enrtuples.  It's a concept regularly seen in planner data
>> >> structures but not otherwise seen at parse tree level.
>> >
>> > I agree that this is strange.  Perhaps
>> > set_namedtuplestore_size_estimates should instead look up the
>> > EphemeralNamedRelation by rte->enrname to find its way to
>> > enr->md.enrtuples, but I'm not sure off the top of my head how it
>> > should get its hands on the QueryEnvironment required to do that.  I
>> > will look into this on Monday, but other ideas/clues welcome...
>>
>> Here's some background:  If you look at the interface changes
>> introduced by 18ce3a4 you will see that it is now possible for a
>> QueryEnvironment object to be injected into the parser and executor.
>> Currently the only use for it is to inject named tuplestores into the
>> system via SPI_register_relation or SPI_register_trigger_data.  That's
>> to support SQL standard transition tables, but anyone can now use SPI
>> to expose data to SQL via an ephemeral named relation in this way.
>> (In future we could imagine other kinds of objects like table
>> variables, anonymous functions or streams).
>>
>> The QueryEnvironment is used by the parser to resolve names and build
>> RangeTblEntry objects.  The planner doesn't currently need it, because
>> all the information it needs is in the RangeTblEntry, including the
>> offending row estimate.  Then the executor needs it to get its hands
>> on the tuplestores.  So the question is: how can we get it into
>> costsize.c, so that it can look up the EphermalNamedRelationMetaData
>> object by name, instead of trafficking statistics through parser data
>> structures?
>>
>> Here are a couple of ways forward that I can see:
>>
>> 1.  Figure out how to get the QueryEnvironment through more of these
>> stack frames (possibly inside other objects), so that
>> set_namedtuplestore_size_estimates can look up enrtuples by enrname:
>>
>>   set_namedtuplestore_size_estimates <-- would need QueryEnvironment
>>   set_namedtuplestore_pathlist
>>   set_rel_size
>>   set_base_rel_sizes
>>   make_one_rel
>>   query_planner
>>   grouping_planner
>>   subquery_planner
>>   standard_planner
>>   planner
>>   pg_plan_query
>>   pg_plan_queries <-- doesn't receive QueryEnvironment
>>   BuildCachedPlan <-- receives QueryEnvironment
>>   GetCachedPlan
>>   _SPI_execute_plan
>>   SPI_execute_plan_with_paramlist
>>
>> 2.  Rip the row estimation out for now, use a bogus hard coded
>> estimate like we do in some other cases, and revisit later.  See
>> attached (including changes from my previous message).
>> Unsurprisingly, a query plan changes.
>>
>> Thoughts?
>
> I'm not sufficiently familiar with the relevant code to judge this one.  Let's
> see if planner experts voice an opinion.  Absent more opinions, the current
> design stands.

Here's another thought I had about this during review[1] and didn't
follow up:  Query plans are cached by plpgsql and reused.  If you're
unlucky, on the first firing of a given trigger a transition
tuplestore could have 0 or 1 rows and on the second firing it could
have a zillion rows, I guess potentially resulting in pathologically
bad performance.  I do think it's a good idea for SPI client code to
be able to provide an estimate via SPI_register_relation (if the
coding passes muster or can be suitably refactored), but maybe "oh,
about 1,000" is actually better for transition tables anyway in the
absence of some kind of adaptive replanning or user declarations.

[1] https://www.postgresql.org/message-id/CAEepm%3D02A5LTfGtHo8xWpEmW4AYY%2BES-uPr02bWb0OzGF4Ej-Q%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Robert Haas
Дата:
On Mon, Jun 12, 2017 at 12:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here are a couple of ways forward that I can see:
>
> 1.  Figure out how to get the QueryEnvironment through more of these
> stack frames (possibly inside other objects), so that
> set_namedtuplestore_size_estimates can look up enrtuples by enrname:

If you were going to do this, the thing to do would be to get it
hooked up to the PlannerInfo, possibly via an intermediate hop in the
Query.

> 2.  Rip the row estimation out for now, use a bogus hard coded
> estimate like we do in some other cases, and revisit later.  See
> attached (including changes from my previous message).
> Unsurprisingly, a query plan changes.

Perhaps this is a silly question, but I don't particularly see what's
wrong with:

3. Do nothing.

If I understand correctly, for the current use of named tuplestores,
the existing code produces correct estimates.  If we chose option #1,
that would still be true, but we'd have to change a bunch of code to
get there.  If we chose option #2, it would cease to be true.  Why not
just leave it alone?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Perhaps this is a silly question, but I don't particularly see what's
> wrong with:

> 3. Do nothing.

Well, the fundamental problem is that the RTE is a lousy place to keep
rowcount estimates.  That breaks assorted desirable properties like
querytrees being readonly to planning/execution (not that we don't
end up copying them anyway, but up to now that's been because of bad
implementation not because the representation was broken by design).

I agree that it's probably not so badly broken that we have to fix it
in time for v10, but this is not where we want to be in the long run.
        regards, tom lane



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Robert Haas
Дата:
On Tue, Jun 13, 2017 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Perhaps this is a silly question, but I don't particularly see what's
>> wrong with:
>
>> 3. Do nothing.
>
> Well, the fundamental problem is that the RTE is a lousy place to keep
> rowcount estimates.  That breaks assorted desirable properties like
> querytrees being readonly to planning/execution (not that we don't
> end up copying them anyway, but up to now that's been because of bad
> implementation not because the representation was broken by design).

How does it break those properties?  I don't think enrtuples is being
modified by planning or execution as things stand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 13, 2017 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, the fundamental problem is that the RTE is a lousy place to keep
>> rowcount estimates.  That breaks assorted desirable properties like
>> querytrees being readonly to planning/execution (not that we don't
>> end up copying them anyway, but up to now that's been because of bad
>> implementation not because the representation was broken by design).

> How does it break those properties?  I don't think enrtuples is being
> modified by planning or execution as things stand.

But it needs to be changeable, unless you like the proposition that we
can never replan a query inside a trigger on the basis of new information
about how big the transition table is.  Even if you're okay with that
particular restriction, the NamedTupleStore stuff is supposed to be
flexible enough to accommodate other use-cases, and some of them will
surely not be okay with an immutable estimate for the store's size.

Thomas noted upthread that there wasn't any API for injecting a rowcount
estimate from an SPI caller, which is wrong actually: there's already an
enrtuples field in EphemeralNamedRelationMetadata.  So another way to
explain why this is broken is that having a field in the RTE is redundant
with that.
        regards, tom lane



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Robert Haas
Дата:
On Tue, Jun 13, 2017 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How does it break those properties?  I don't think enrtuples is being
>> modified by planning or execution as things stand.
>
> But it needs to be changeable, unless you like the proposition that we
> can never replan a query inside a trigger on the basis of new information
> about how big the transition table is.  Even if you're okay with that
> particular restriction, the NamedTupleStore stuff is supposed to be
> flexible enough to accommodate other use-cases, and some of them will
> surely not be okay with an immutable estimate for the store's size.

Hmm, true.  But even if we extracted enrtuples from the
RangeTbleEntry, there wouldn't be any mechanism to actually trigger
such a replan, would there?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 13, 2017 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But it needs to be changeable, unless you like the proposition that we
>> can never replan a query inside a trigger on the basis of new information
>> about how big the transition table is.  Even if you're okay with that
>> particular restriction, the NamedTupleStore stuff is supposed to be
>> flexible enough to accommodate other use-cases, and some of them will
>> surely not be okay with an immutable estimate for the store's size.

> Hmm, true.  But even if we extracted enrtuples from the
> RangeTbleEntry, there wouldn't be any mechanism to actually trigger
> such a replan, would there?

You're just pointing out that there's a lot of unfinished work around
this mechanism.  I don't think anybody has claimed otherwise.
        regards, tom lane



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Robert Haas
Дата:
On Tue, Jun 13, 2017 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 13, 2017 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> But it needs to be changeable, unless you like the proposition that we
>>> can never replan a query inside a trigger on the basis of new information
>>> about how big the transition table is.  Even if you're okay with that
>>> particular restriction, the NamedTupleStore stuff is supposed to be
>>> flexible enough to accommodate other use-cases, and some of them will
>>> surely not be okay with an immutable estimate for the store's size.
>
>> Hmm, true.  But even if we extracted enrtuples from the
>> RangeTbleEntry, there wouldn't be any mechanism to actually trigger
>> such a replan, would there?
>
> You're just pointing out that there's a lot of unfinished work around
> this mechanism.  I don't think anybody has claimed otherwise.

I'm just trying to understand your comments so that I can have an
intelligent opinion about what we should do from here.  Given that the
replan wouldn't happen anyway, there seems to be no reason to tinker
with the location of enrtuples for v10 -- which is exactly what both
of us already said, but I was confused about your comments about
enrtuples getting changed.  I think that I am no longer confused.

We still need to review and commit the minimal cleanup patch Thomas
posted upthread (v1, not v2).  I think I might go do that unless
somebody else is feeling the urge to whack it around before commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Thomas Munro
Дата:
On Wed, Jun 14, 2017 at 4:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm just trying to understand your comments so that I can have an
> intelligent opinion about what we should do from here.  Given that the
> replan wouldn't happen anyway, there seems to be no reason to tinker
> with the location of enrtuples for v10 -- which is exactly what both
> of us already said, but I was confused about your comments about
> enrtuples getting changed.  I think that I am no longer confused.
>
> We still need to review and commit the minimal cleanup patch Thomas
> posted upthread (v1, not v2).  I think I might go do that unless
> somebody else is feeling the urge to whack it around before commit.

OK, if we're keeping enrtuples in RangeTblEntry for v10 then we'd
better address Noah's original complaint that it's missing from
_{copy,equal,out,read}RangeTblEntry() .  Here's a patch for that, to
apply on top of the -v1 patch above.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

От
Robert Haas
Дата:
On Tue, Jun 13, 2017 at 4:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jun 14, 2017 at 4:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'm just trying to understand your comments so that I can have an
>> intelligent opinion about what we should do from here.  Given that the
>> replan wouldn't happen anyway, there seems to be no reason to tinker
>> with the location of enrtuples for v10 -- which is exactly what both
>> of us already said, but I was confused about your comments about
>> enrtuples getting changed.  I think that I am no longer confused.
>>
>> We still need to review and commit the minimal cleanup patch Thomas
>> posted upthread (v1, not v2).  I think I might go do that unless
>> somebody else is feeling the urge to whack it around before commit.
>
> OK, if we're keeping enrtuples in RangeTblEntry for v10 then we'd
> better address Noah's original complaint that it's missing from
> _{copy,equal,out,read}RangeTblEntry() .  Here's a patch for that, to
> apply on top of the -v1 patch above.

Committed both of those together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company