Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id 20230927170121.j3klc3xi4yonle5y@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Eager page freeze criteria clarification  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,

On 2023-09-26 09:07:13 -0700, Peter Geoghegan wrote:
> On Tue, Sep 26, 2023 at 8:19 AM Andres Freund <andres@anarazel.de> wrote:
> > However, I'm not at all convinced doing this on a system wide level is a good
> > idea. Databases do often contain multiple types of workloads at the same
> > time. E.g., we want to freeze aggressively in a database that has the bulk of
> > its size in archival partitions but has lots of unfrozen data in an active
> > partition. And databases have often loads of data that's going to change
> > frequently / isn't long lived, and we don't want to super aggressively freeze
> > that, just because it's a large portion of the data.
> 
> I didn't say that we should always have most of the data in the
> database frozen, though. Just that we can reasonably be more lazy
> about freezing the remainder of pages if we observe that most pages
> are already frozen. How they got that way is another discussion.
> 
> I also think that the absolute amount of debt (measured in physical
> units such as unfrozen pages) should be kept under control. But that
> isn't something that can ever be expected to work on the basis of a
> simple threshold -- if only because autovacuum scheduling just doesn't
> work that way, and can't really be adapted to work that way.

I don't think doing this on a system wide basis with a metric like #unfrozen
pages is a good idea. It's quite common to have short lived data in some
tables while also having long-lived data in other tables. Making opportunistic
freezing more aggressive in that situation will just hurt, without a benefit
(potentially even slowing down the freezing of older data!). And even within a
single table, making freezing more aggressive because there's a decent sized
part of the table that is updated regularly and thus not frozen, doesn't make
sense.

If we want to take global freeze debt into account, which I think is a good
idea, we'll need a smarter way to represent the debt than just the number of
unfrozen pages.  I think we would need to track the age of unfrozen pages in
some way. If there are a lot of unfrozen pages with a recent xid, then it's
fine, but if they are older and getting older, it's a problem and we need to
be more aggressive.  The problem I see is how track the age of unfrozen data -
it'd be easy enough to track the mean(oldest-64bit-xid-on-page), but then we
again have the issue of rare outliers moving the mean too much...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Unlinking Parallel Hash Join inner batch files sooner
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Set enable_seqscan doesn't take effect?