Обсуждение: Re: [SQL] 7.4 - FK constraint performance

Поиск
Список
Период
Сортировка

Re: [SQL] 7.4 - FK constraint performance

От
Stephan Szabo
Дата:
On Fri, 13 Feb 2004, Stephan Szabo wrote:

>
> On Fri, 13 Feb 2004, Tom Lane wrote:
>
> > Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > > On Fri, 13 Feb 2004, Tom Lane wrote:
> > >> I was looking at that last night.  It seems like we could add a LIMIT at
> > >> least in some contexts.  In the case at hand, we're just going to error
> > >> out immediately if we find a matching row, and so there's no need for
> > >> FOR UPDATE, is there?
> >
> > > I think there still is, because a not yet committed transaction could have
> > > deleted them all in which case I think the correct behavior is to wait and
> > > if that transaction commits allow the action and if it rolls back to
> > > error.
> >
> > Good point.  Okay, we can't put in a LIMIT.  But we could still hack the
> > planner to prefer a fast-start plan by passing an out-of-band tuple
> > fraction, for those RI plans where it's appropriate.  That would not
> > affect correctness.
>
> Right, I can try to look through the stuff you pointed at in the previous
> message over the weekend.

It looks to me that we could make this available to SPI fairly simply by
taking the current version of the following four routines: planner,
pg_plan_query, _SPI_execute and SPI_prepare, renaming them and giving them
a planning tuple fraction as a parameter, change references to the other
routines to the new names and then making four new functions with the
current names that call the renamed versions. In all the cases other than
planner I think we can have the new version pass 0.0 and in the case of
planner either 0.1 or 0.0 based on the isCursor parameter.


Re: [SQL] 7.4 - FK constraint performance

От
Stephan Szabo
Дата:
On Sun, 15 Feb 2004, Stephan Szabo wrote:

> On Fri, 13 Feb 2004, Stephan Szabo wrote:
>
> >
> > On Fri, 13 Feb 2004, Tom Lane wrote:
> >
> > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > > > On Fri, 13 Feb 2004, Tom Lane wrote:
> > > >> I was looking at that last night.  It seems like we could add a LIMIT at
> > > >> least in some contexts.  In the case at hand, we're just going to error
> > > >> out immediately if we find a matching row, and so there's no need for
> > > >> FOR UPDATE, is there?
> > >
> > > > I think there still is, because a not yet committed transaction could have
> > > > deleted them all in which case I think the correct behavior is to wait and
> > > > if that transaction commits allow the action and if it rolls back to
> > > > error.
> > >
> > > Good point.  Okay, we can't put in a LIMIT.  But we could still hack the
> > > planner to prefer a fast-start plan by passing an out-of-band tuple
> > > fraction, for those RI plans where it's appropriate.  That would not
> > > affect correctness.
> >
> > Right, I can try to look through the stuff you pointed at in the previous
> > message over the weekend.
>
> It looks to me that we could make this available to SPI fairly simply by
> taking the current version of the following four routines: planner,
> pg_plan_query, _SPI_execute and SPI_prepare, renaming them and giving them
> a planning tuple fraction as a parameter, change references to the other
> routines to the new names and then making four new functions with the
> current names that call the renamed versions. In all the cases other than
> planner I think we can have the new version pass 0.0 and in the case of
> planner either 0.1 or 0.0 based on the isCursor parameter.

I did this, and changed the foreign keys to use it, but I haven't managed
to build a fk case where I could actually detect a change in the plan
chosen.  Since the queries are only a simple scan on the one table I'm
wondering if it's basically just modifying both costs by the same value
which means there's no real effect at all.


Re: [SQL] 7.4 - FK constraint performance

От
Tom Lane
Дата:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I did this, and changed the foreign keys to use it, but I haven't managed
> to build a fk case where I could actually detect a change in the plan
> chosen.  Since the queries are only a simple scan on the one table I'm
> wondering if it's basically just modifying both costs by the same value
> which means there's no real effect at all.

After fooling with this, I think you are right.  The planner is modeling
both cases as a linear slope from zero to estimated-total-cost, and
since the total number of tuples to be returned is the same, taking a
percentage won't change the outcome.

The only way we could improve the situation would be to somehow instruct
the planner that even though we don't know the parameter value to be
used when we are planning, it should expect that that value is *not* in
the table, rather than expecting that it has a distribution similar to
what is in the table.  Seems pretty messy.

I have occasionally speculated about postponing planning of
parameterized queries until they are first executed, and then using the
actual parameter values supplied in that first execution for purposes of
estimating costs and selectivity.  That would work pretty nicely in this
particular case, but in the real world I think it'd be mighty dangerous;
you could end up optimizing for an outlier case that isn't
representative of the queries you'll see later.

Another interesting line of thought is to let the user supply
representative values to be used for planning purposes.  In PREPARE,
you could imagine saying something like
PREPARE myquery (int = 42, text = 'http://www...') AS    SELECT ... where url like $2 ...

and then using the sample value 'http://www...' for purposes of
estimating the LIKE result.  Then it'd be on the user's head to pick
good representatives --- but he could make sure that they really were
representative, and not have to worry about luck of the draw from the
first live query.

I'm not sure if we could use such a feature automatically to bias FK
queries in the right direction, but it's something to think about.
        regards, tom lane