Re: pgsql: When estimating the selectivity of an inequality "column >

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: pgsql: When estimating the selectivity of an inequality "column >
Дата
Msg-id 407d949e1001040529k9ec48a8vd0b6d54c33001f06@mail.gmail.com
обсуждение исходный текст
Ответы Re: pgsql: When estimating the selectivity of an inequality "column >  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-hackers
On Mon, Jan 4, 2010 at 2:44 AM, Tom Lane <tgl@postgresql.org> wrote:
> Log Message:
> -----------
> When estimating the selectivity of an inequality "column > constant" or
> "column < constant", and the comparison value is in the first or last
> histogram bin or outside the histogram entirely, try to fetch the actual
> column min or max value using an index scan (if there is an index on the
> column).  If successful, replace the lower or upper histogram bound with
> that value before carrying on with the estimate.  This limits the
> estimation error caused by moving min/max values when the comparison
> value is close to the min or max.  Per a complaint from Josh Berkus.

So.... This is pretty cool but it's worth discussing the downsides.
This is the second bit of live information the planning process is
peeking at to supplement the statistics (the first is the physical
size of the table). There are two use cases that I see being
negatively impacted by these features, neither of which we support
currently but I expect we'll eventually support and when we do we'll
have to work around these problems. The two problematic use cases I
see are: plan stability and exporting statistics to a another machine
to reproduce behaviour on a test machine.

I'm not sure what approach we'll have to take to work around these
problems. Perhaps we just need a way to disable checking these bits of
information and have a fallback strategy? perhaps we need a hook for
each of these fetches so you can store the size and upper bound of the
table on the production system you're trying to replicate the
behaviour of and a hook which substitutes these values for the dynamic
value? Perhaps the hook could even use a dbilink connection to get the
live production values.

I also wonder if we want to run these lookups on every single planner
invocation. If the table is being rarely updated analyze will never
fire and the stats value will never be updated. I don't think we want
the planner doing write operations to the stats either though. It
would be nice if there was a low-cost mode of operation for ANALYZE
which simulates precisely this update, the planner could send a stats
message saying that autovacuum should schedule one of these operations
sometime even if it doesn't see a need for a full analyze run.




--
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: quoting psql varible as identifier
Следующее
От: Mark Cave-Ayland
Дата:
Сообщение: Re: Add subdirectory support for DATA/DOCS with PGXS