Re: Unhappy about API changes in the no-fsm-for-small-rels patch

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Дата
Msg-id 20190506161818.lncinmpwp4ycqn7r@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2019-05-06 11:52:12 -0400, Robert Haas wrote:
> On Mon, May 6, 2019 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> > > I think it's legitimate to question whether sending additional
> > > invalidation messages as part of the design of this feature is a good
> > > idea.  If it happens frequently, it could trigger expensive sinval
> > > resets more often.  I don't understand the various proposals well
> > > enough to know whether that's really a problem, but if you've got a
> > > lot of relations for which this optimization is in use, I'm not sure I
> > > see why it couldn't be.
> >
> > I don't think it's an actual problem. We'd only do so when creating an
> > FSM, or when freeing up additional space that'd otherwise not be visible
> > to other backends. The alternative to sinval would thus be a) not
> > discovering there's free space and extending the relation b) checking
> > disk state for a new FSM all the time. Which are much more expensive.
> 
> None of that addresses the question of the distributed cost of sending
> more sinval messages.  If you have a million little tiny relations and
> VACUUM goes through and clears one tuple out of each one, it will be
> spewing sinval messages really, really fast.  How can that fail to
> threaten extra sinval resets?

Vacuum triggers sinval messages already (via the pg_class update),
shouldn't be too hard to ensure that there's no duplicate ones in this
case.


> > > I think at some point it was proposed that, since an FSM access
> > > involves touching 3 blocks, it ought to be fine for any relation of 4
> > > or fewer blocks to just check all the others.  I don't really
> > > understand why we drifted off that design principle, because it seems
> > > like a reasonable theory.  Such an approach doesn't require anything
> > > in the relcache, any global variables, or an every-other-page
> > > algorithm.
> >
> > It's not that cheap to touch three heap blocks every time a new target
> > page is needed. Requires determining at least the target relation size
> > or the existance of the FSM fork.
> >
> > We'll also commonly *not* end up touching 3 blocks in the FSM -
> > especially when there's actually no free space. And the FSM contents are
> > much less contended than the heap pages - the hot paths don't update the
> > FSM, and if so, the exclusive locks are held for a very short time only.
> 
> Well, that seems like an argument that we just shouldn't do this at
> all.  If the FSM is worthless for small relations, then eliding it
> makes sense.  But if having it is valuable even when the relation is
> tiny, then eliding it is the wrong thing to do, isn't it?

Why? The problem with the entirely stateless proposal is just that we'd
do that every single time we need new space. If we amortize that cost
across multiple insertions, I don't think there's a problem?


> I do find it a bit surprising that touching heap pages would be all
> that much more expensive than touching FSM pages, but that doesn't
> mean that it isn't the case.  I would also note that this algorithm
> ought to beat the FSM algorithm in many cases where there IS space
> available, because you'll often find some usable free space on the
> very first page you try, which will never happen with the FSM.

Note that without additional state we do not *know* that the heap is 5
pages long, we have to do an smgrnblocks() - which is fairly
expensive. That's precisely why I want to keep state about a
non-existant FSM in the relcache, and why'd need sinval messages to
clear that. So we don't incur unnecessary syscalls when there's free
space.

I completely agree that avoiding the FSM for the small-rels case has the
potential to be faster, if we're not too naive about it. I think that
means

1) no checking of on-disk state for relation fork existance/sizes every
   time looking up a page with free space
2) not re-scanning pages when we should know they're full (because we
   scanned them for the last target page, in a previous insert)
3) ability to recognize concurrently freed space


> The case where the pages are all full doesn't seem very important,
> because I don't see how you can stay in that situation for all that
> long. Each time it happens, the relation grows by a block immediately
> afterwards, and once it hits 5 blocks, it never happens again.

> I guess you could incur the overhead repeatedly if the relation starts
> out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
> repeat, but is that actually realistic?  It requires all the live
> tuples to live in block 0 at the beginning of each vacuum cycle, which
> seems like a fringe outcome.

I think it's much more likely to be encountered when there's a lot of
churn on a small table, but HOT pruning removes just about all the
superflous space on a regular basis. Then the relation might actually
never get > 4 blocks.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: make \d pg_toast.foo show its indices ; and, \d toast show itsmain table