Re: Row estimates for empty tables

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Row estimates for empty tables
Дата
Msg-id CAApHDvrJEUpnDDJ-MAqJRFg1twdriT2y_ANRMDUitg+2d4CcaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Row estimates for empty tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, 25 Jul 2020 at 10:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > 1. Arrange to send out a relcache inval when adding the first page to
> > a table, and then remove the planner hack for disbelieving relpages = 0.
> > I fear this'd be a mess from a system structural standpoint, but it might
> > work fairly transparently.
>
> I experimented with doing this.  It's not hard to code, if you don't mind
> having RelationGetBufferForTuple calling CacheInvalidateRelcache.  I'm not
> sure whether that code path might cause any long-term problems, but it
> seems to work OK right now.  However, this solution causes massive
> "failures" in the regression tests as a result of plans changing.  I'm
> sure that's partly because we use so many small tables in the tests.
> Nonetheless, it's not promising from the standpoint of not causing
> unexpected problems in the real world.

I guess all these changes would be the planner moving towards a plan
that suits having fewer rows for the given table better.  If so, that
does seem quite scary as we already have enough problems from the
planner choosing poor plans when it thinks there are fewer rows than
there actually are.  Don't we need to keep something like the 10-page
estimate there so safer plans are produced before auto-vacuum gets in
and gathers some proper stats?

I think if anything we'd want to move in the direction of producing
more cautious plans when the estimated number of rows is low. Perhaps
especially so for when the planner opts to do things like perform a
non-parameterized nested loop join when it thinks the RelOptInfo with,
say 3, unbeknown-to-the-planner, correlated, base restrict quals that
are thought to produce just 1 row, but actually produce many more.

> > 2. Establish the convention that vacuuming or analyzing an empty table
> > is what you do to tell the system that this state is going to persist.
> > That's more or less what the existing comments in plancat.c envision,
> > but we never made a definition for how the occurrence of that event
> > would be recorded in the catalogs, other than setting relpages > 0.
> > Rather than adding another pg_class column, I'm tempted to say that
> > vacuum/analyze should set relpages to a minimum of 1, even if the
> > relation has zero pages.
>
> I also tried this, and it seems a lot more promising: no existing
> regression test cases change.  So perhaps we should do the attached
> or something like it.

This sounds like a more plausible solution. At least this way there's
an escape hatch for people who suffer due to this.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Improving connection scalability: GetSnapshotData()
Следующее
От: vignesh C
Дата:
Сообщение: Re: Include access method in listTables output