Re: Higher level questions around shared memory stats

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Higher level questions around shared memory stats
Дата
Msg-id 20220330210841.2dymkb2ydbdonp5f@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Higher level questions around shared memory stats  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Higher level questions around shared memory stats
Список pgsql-hackers
Hi,

On 2022-03-30 14:42:23 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote:
> > I think it's reasonably rare because in cases there'd be corruption, we'd
> > typically not even have written them out / throw them away explicitly - we
> > only read stats when starting without crash recovery.
> >
> > So the "expected" case of corruption afaicts solely is a OS crash just after
> > the shutdown checkpoint completed?
> 
> Can we prevent that case from occurring, so that there are no expected cases?

We likely can, at least for the causes of corruption I know of. We already
write the statsfile into a temporary filename and then rename into place. I
think all we'd need to do is to use durable_rename() to make sure it's durable
once renamed into place.

It's really unrelated to the shared memory stats patch though, so I'd prefer
not to tie it to that.


> > I can think of these different times:
> >
> > - Last time stats were removed due to starting up in crash recovery
> > - Last time stats were created from scratch, because no stats data file was
> >   present at startup
> > - Last time stats were thrown away due to corruption
> > - Last time a subset of stats were reset using one of the pg_reset* functions
> >
> > Makes sense?
> 
> Yes. Possibly that last could be broken in to two: when all stats were
> last reset, when some stats were last reset.

Believe it or not, we don't currently have a function to reset all stats. We
should definitely add that though, because the invocation to reset all stats
gets more ridiculous^Wcomplicated with each release.

I think the minimal invocation currently is something like:

-- reset all stats shared between databases
SELECT pg_stat_reset_shared('archiver');
SELECT pg_stat_reset_shared('bgwriter');
SELECT pg_stat_reset_shared('wal');
SELECT pg_stat_reset_replication_slot(NULL);
SELECT pg_stat_reset_slru(NULL);
SELECT pg_stat_reset_subscription_stats(NULL);

-- connect to each database and reset the stats in that database
SELECT pg_stat_reset();


I've protested against replication slot, slru, subscription stats not being
resettable via pg_stat_reset_shared(), nobody else seemed to care.


> > > Does redo update the stats?
> >
> > With "update" do you mean generate new stats? In the shared memory stats patch
> > it triggers stats to be dropped, on HEAD it just resets all stats at startup.
> >
> > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.
> 
> Well, I guess what I'm trying to figure out is what happens if we run
> in recovery for a long time -- say, a year -- and then get promoted.
> Do we have reasons to expect that the stats will be accurate enough to
> use at that point, or will they be way off?

What do you mean with 'accurate enough'?

With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze,
n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay
process doesn't update them.

In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read,
idx_tup_fetch, ...} will be accurate, with one exception below.

pg_stat_bgwriter, pg_stat_wal, etc will always be accurate.


On HEAD, there may be a lot of dead stats for dropped databases / tables /
functions that have been dropped since the start of the cluster. They will
eventually get removed, once autovacuum starts running in the respective
database (i.e. pgstat_vacuum_stat() gets run).

The exception noted above is that because pg_stat_all_tables contents are
never removed during recovery, it becomes a lot more plausible for oid
conflicts to occur. So the stats for two different tables might get added up
accidentally - but that'll just affect the non-zero columns, of course.


With the shared memory stats patch, stats for dropped objects (i.e. databases,
tables, ... ) are removed shortly after they have been dropped, so that
conflict risk doesn't exist anymore.


So I don't think increasing inaccuracy is a reason to throw away stats on
replica startup. Particularly because we already don't throw them away when
promoting the replica, just when having started it last.



> I don't have a great understanding of how this all works, but if
> running recovery for a long time is going to lead to a situation where
> the stats progressively diverge from reality, then preserving them
> doesn't seem as valuable as if they're going to be more or less
> accurate.

Minus the oid wraparound risk on HEAD, the only way they increasingly diverge
is that the '0' in a bunch of pg_stat_all_tables columns might get less and
less accurate. But that's not the type of divergence you're talking about, I
think.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: [Proposal] vacuumdb --schema only