Re: Higher level questions around shared memory stats
От | Andres Freund |
---|---|
Тема | Re: Higher level questions around shared memory stats |
Дата | |
Msg-id | 20220330210841.2dymkb2ydbdonp5f@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Higher level questions around shared memory stats (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Higher level questions around shared memory stats
|
Список | pgsql-hackers |
Hi, On 2022-03-30 14:42:23 -0400, Robert Haas wrote: > On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres@anarazel.de> wrote: > > I think it's reasonably rare because in cases there'd be corruption, we'd > > typically not even have written them out / throw them away explicitly - we > > only read stats when starting without crash recovery. > > > > So the "expected" case of corruption afaicts solely is a OS crash just after > > the shutdown checkpoint completed? > > Can we prevent that case from occurring, so that there are no expected cases? We likely can, at least for the causes of corruption I know of. We already write the statsfile into a temporary filename and then rename into place. I think all we'd need to do is to use durable_rename() to make sure it's durable once renamed into place. It's really unrelated to the shared memory stats patch though, so I'd prefer not to tie it to that. > > I can think of these different times: > > > > - Last time stats were removed due to starting up in crash recovery > > - Last time stats were created from scratch, because no stats data file was > > present at startup > > - Last time stats were thrown away due to corruption > > - Last time a subset of stats were reset using one of the pg_reset* functions > > > > Makes sense? > > Yes. Possibly that last could be broken in to two: when all stats were > last reset, when some stats were last reset. Believe it or not, we don't currently have a function to reset all stats. We should definitely add that though, because the invocation to reset all stats gets more ridiculous^Wcomplicated with each release. I think the minimal invocation currently is something like: -- reset all stats shared between databases SELECT pg_stat_reset_shared('archiver'); SELECT pg_stat_reset_shared('bgwriter'); SELECT pg_stat_reset_shared('wal'); SELECT pg_stat_reset_replication_slot(NULL); SELECT pg_stat_reset_slru(NULL); SELECT pg_stat_reset_subscription_stats(NULL); -- connect to each database and reset the stats in that database SELECT pg_stat_reset(); I've protested against replication slot, slru, subscription stats not being resettable via pg_stat_reset_shared(), nobody else seemed to care. > > > Does redo update the stats? > > > > With "update" do you mean generate new stats? In the shared memory stats patch > > it triggers stats to be dropped, on HEAD it just resets all stats at startup. > > > > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do. > > Well, I guess what I'm trying to figure out is what happens if we run > in recovery for a long time -- say, a year -- and then get promoted. > Do we have reasons to expect that the stats will be accurate enough to > use at that point, or will they be way off? What do you mean with 'accurate enough'? With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze, n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay process doesn't update them. In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read, idx_tup_fetch, ...} will be accurate, with one exception below. pg_stat_bgwriter, pg_stat_wal, etc will always be accurate. On HEAD, there may be a lot of dead stats for dropped databases / tables / functions that have been dropped since the start of the cluster. They will eventually get removed, once autovacuum starts running in the respective database (i.e. pgstat_vacuum_stat() gets run). The exception noted above is that because pg_stat_all_tables contents are never removed during recovery, it becomes a lot more plausible for oid conflicts to occur. So the stats for two different tables might get added up accidentally - but that'll just affect the non-zero columns, of course. With the shared memory stats patch, stats for dropped objects (i.e. databases, tables, ... ) are removed shortly after they have been dropped, so that conflict risk doesn't exist anymore. So I don't think increasing inaccuracy is a reason to throw away stats on replica startup. Particularly because we already don't throw them away when promoting the replica, just when having started it last. > I don't have a great understanding of how this all works, but if > running recovery for a long time is going to lead to a situation where > the stats progressively diverge from reality, then preserving them > doesn't seem as valuable as if they're going to be more or less > accurate. Minus the oid wraparound risk on HEAD, the only way they increasingly diverge is that the '0' in a bunch of pg_stat_all_tables columns might get less and less accurate. But that's not the type of divergence you're talking about, I think. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "David G. Johnston"Дата:
Сообщение: Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset