Re: Unhappy about API changes in the no-fsm-for-small-rels patch

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Дата
Msg-id CA+TgmobgKtO8MwdwY5tp_Sqr8OZ_s+DX1OhMXFM+eyz77mCDKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Andres Freund <andres@anarazel.de>)
Ответы Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, May 6, 2019 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> > I think it's legitimate to question whether sending additional
> > invalidation messages as part of the design of this feature is a good
> > idea.  If it happens frequently, it could trigger expensive sinval
> > resets more often.  I don't understand the various proposals well
> > enough to know whether that's really a problem, but if you've got a
> > lot of relations for which this optimization is in use, I'm not sure I
> > see why it couldn't be.
>
> I don't think it's an actual problem. We'd only do so when creating an
> FSM, or when freeing up additional space that'd otherwise not be visible
> to other backends. The alternative to sinval would thus be a) not
> discovering there's free space and extending the relation b) checking
> disk state for a new FSM all the time. Which are much more expensive.

None of that addresses the question of the distributed cost of sending
more sinval messages.  If you have a million little tiny relations and
VACUUM goes through and clears one tuple out of each one, it will be
spewing sinval messages really, really fast.  How can that fail to
threaten extra sinval resets?

> > I think at some point it was proposed that, since an FSM access
> > involves touching 3 blocks, it ought to be fine for any relation of 4
> > or fewer blocks to just check all the others.  I don't really
> > understand why we drifted off that design principle, because it seems
> > like a reasonable theory.  Such an approach doesn't require anything
> > in the relcache, any global variables, or an every-other-page
> > algorithm.
>
> It's not that cheap to touch three heap blocks every time a new target
> page is needed. Requires determining at least the target relation size
> or the existance of the FSM fork.
>
> We'll also commonly *not* end up touching 3 blocks in the FSM -
> especially when there's actually no free space. And the FSM contents are
> much less contended than the heap pages - the hot paths don't update the
> FSM, and if so, the exclusive locks are held for a very short time only.

Well, that seems like an argument that we just shouldn't do this at
all.  If the FSM is worthless for small relations, then eliding it
makes sense.  But if having it is valuable even when the relation is
tiny, then eliding it is the wrong thing to do, isn't it?  The
underlying concerns that prompted this patch probably have to do with
either [1] not wanting to have so many FSM forks on disk or [2] not
wanting to consume 24kB of space to track free space for a relation
that may be only 8kB.  I think those goals are valid, but if we accept
your argument then this is the wrong way to achieve them.

I do find it a bit surprising that touching heap pages would be all
that much more expensive than touching FSM pages, but that doesn't
mean that it isn't the case.  I would also note that this algorithm
ought to beat the FSM algorithm in many cases where there IS space
available, because you'll often find some usable free space on the
very first page you try, which will never happen with the FSM.  The
case where the pages are all full doesn't seem very important, because
I don't see how you can stay in that situation for all that long.
Each time it happens, the relation grows by a block immediately
afterwards, and once it hits 5 blocks, it never happens again.  I
guess you could incur the overhead repeatedly if the relation starts
out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
repeat, but is that actually realistic?  It requires all the live
tuples to live in block 0 at the beginning of each vacuum cycle, which
seems like a fringe outcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: reindexdb & clusterdb broken against pre-7.3 servers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: make \d pg_toast.foo show its indices