Обсуждение: Flush pgstats file during checkpoints

Поиск
Список
Период
Сортировка

Flush pgstats file during checkpoints

От
Michael Paquier
Дата:
Hi all,

On HEAD, xlog.c has the following comment, which has been on my own
TODO list for a couple of weeks now:

     * TODO: With a bit of extra work we could just start with a pgstat file
     * associated with the checkpoint redo location we're starting from.

Please find a patch series to implement that, giving the possibility
to keep statistics after a crash rather than discard them.  I have
been looking at the code for a while, before settling down on:
- Forcing the flush of the pgstats file to happen during non-shutdown
checkpoint and restart points, after updating the control file's redo
LSN and the critical sections in the area.
- Leaving the current before_shmem_exit() callback around, as a matter
of delaying the flush of the stats for as long as possible in a
shutdown sequence.  This also makes the single-user mode shutdown
simpler.
- Adding the redo LSN to the pgstats file, with a bump of
PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN.  This change
is independently useful on its own when loading stats after a clean
startup, as well.
- The crash recovery case is simplified, as there is no more need for
the "discard" code path.
- Not using a logic where I would need to stick a LSN into the stats
file name, implying that we would need a potential lookup at the
contents of pg_stat/ to clean up past files at crash recovery.  These
should not be costly, but I'd rather not add more of these.

7ff23c6d277d, that has removed the last call of CreateCheckPoint()
from the startup process, is older than 5891c7a8ed8f, still it seems
to me that pgstats relies on some areas of the code that don't make
sense on HEAD (see locking mentioned at the top of the write routine
for example).  The logic gets straight-forward to think about as
restart points and checkpoints always run from the checkpointer,
implying that pgstat_write_statsfile() is already called only from the
postmaster in single-user mode or the checkpointer itself, at
shutdown.

Attached is a patch set, with the one being the actual feature, with
some stuff prior to that:
- 0001 adds the redo LSN to the pgstats file flushed.
- 0002 adds an assertion in pgstat_write_statsfile(), to check from
where it is called.
- 0003 with more debugging.
- 0004 is the meat of the thread.

I am adding that to the next CF.  Thoughts and comments are welcome.
Thanks,
--
Michael

Вложения

Re: Flush pgstats file during checkpoints

От
Konstantin Knizhnik
Дата:
On 18/06/2024 9:01 am, Michael Paquier wrote:
> Hi all,
>
> On HEAD, xlog.c has the following comment, which has been on my own
> TODO list for a couple of weeks now:
>
>       * TODO: With a bit of extra work we could just start with a pgstat file
>       * associated with the checkpoint redo location we're starting from.
>
> Please find a patch series to implement that, giving the possibility
> to keep statistics after a crash rather than discard them.  I have
> been looking at the code for a while, before settling down on:
> - Forcing the flush of the pgstats file to happen during non-shutdown
> checkpoint and restart points, after updating the control file's redo
> LSN and the critical sections in the area.
> - Leaving the current before_shmem_exit() callback around, as a matter
> of delaying the flush of the stats for as long as possible in a
> shutdown sequence.  This also makes the single-user mode shutdown
> simpler.
> - Adding the redo LSN to the pgstats file, with a bump of
> PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN.  This change
> is independently useful on its own when loading stats after a clean
> startup, as well.
> - The crash recovery case is simplified, as there is no more need for
> the "discard" code path.
> - Not using a logic where I would need to stick a LSN into the stats
> file name, implying that we would need a potential lookup at the
> contents of pg_stat/ to clean up past files at crash recovery.  These
> should not be costly, but I'd rather not add more of these.
>
> 7ff23c6d277d, that has removed the last call of CreateCheckPoint()
> from the startup process, is older than 5891c7a8ed8f, still it seems
> to me that pgstats relies on some areas of the code that don't make
> sense on HEAD (see locking mentioned at the top of the write routine
> for example).  The logic gets straight-forward to think about as
> restart points and checkpoints always run from the checkpointer,
> implying that pgstat_write_statsfile() is already called only from the
> postmaster in single-user mode or the checkpointer itself, at
> shutdown.
>
> Attached is a patch set, with the one being the actual feature, with
> some stuff prior to that:
> - 0001 adds the redo LSN to the pgstats file flushed.
> - 0002 adds an assertion in pgstat_write_statsfile(), to check from
> where it is called.
> - 0003 with more debugging.
> - 0004 is the meat of the thread.
>
> I am adding that to the next CF.  Thoughts and comments are welcome.
> Thanks,
> --
> Michael

Hi Michael.

I am working mostly on the same problem - persisting pgstat state in 
Neon (because of separation of storage and compute it has no local files).
I have two questions concerning this PR and the whole strategy for 
saving pgstat state between sessions.

1. Right now pgstat.stat is discarded after abnormal Postgres 
termination. And in your PR we are storing LSN in pgstat.staty to check 
that it matches checkpoint redo LSN. I wonder if having outdated version 
of pgstat.stat  is worse than not having it at all? Comment in xlog.c 
says: "When starting with crash recovery, reset pgstat data - it might 
not be valid." But why it may be invalid? We are writing it first to 
temp file and then rename. May be fsync() should be added here and 
durable_rename() should be used instead of rename(). But it seems to be 
better than loosing statistics. And should not add some significant 
overhead (because it happens only at shutdown). In your case we are 
checking LSN of pgstat.stat file. But once again - why it is better to 
discard file than load version from previous checkpoint?

2. Second question is also related with 1). So we saved pgstat.stat on 
checkpoint, then did some updates and then Postgres crashed. As far as I 
understand with your patch we will successfully restore pgstats.stat 
file. But it is not actually up-to-date: it doesn't reflect information 
about recent updates. If it was so critical to keep pgstat.stat 
up-to-date, then why do we allow to restore state on most recent checkpoint?

Thanks,
Konstantin




Re: Flush pgstats file during checkpoints

От
Tomas Vondra
Дата:

On 6/28/24 09:32, Konstantin Knizhnik wrote:
> 
> On 18/06/2024 9:01 am, Michael Paquier wrote:
>> Hi all,
>>
>> On HEAD, xlog.c has the following comment, which has been on my own
>> TODO list for a couple of weeks now:
>>
>>       * TODO: With a bit of extra work we could just start with a
>> pgstat file
>>       * associated with the checkpoint redo location we're starting from.
>>
>> Please find a patch series to implement that, giving the possibility
>> to keep statistics after a crash rather than discard them.  I have
>> been looking at the code for a while, before settling down on:
>> - Forcing the flush of the pgstats file to happen during non-shutdown
>> checkpoint and restart points, after updating the control file's redo
>> LSN and the critical sections in the area.
>> - Leaving the current before_shmem_exit() callback around, as a matter
>> of delaying the flush of the stats for as long as possible in a
>> shutdown sequence.  This also makes the single-user mode shutdown
>> simpler.
>> - Adding the redo LSN to the pgstats file, with a bump of
>> PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN.  This change
>> is independently useful on its own when loading stats after a clean
>> startup, as well.
>> - The crash recovery case is simplified, as there is no more need for
>> the "discard" code path.
>> - Not using a logic where I would need to stick a LSN into the stats
>> file name, implying that we would need a potential lookup at the
>> contents of pg_stat/ to clean up past files at crash recovery.  These
>> should not be costly, but I'd rather not add more of these.
>>
>> 7ff23c6d277d, that has removed the last call of CreateCheckPoint()
>> from the startup process, is older than 5891c7a8ed8f, still it seems
>> to me that pgstats relies on some areas of the code that don't make
>> sense on HEAD (see locking mentioned at the top of the write routine
>> for example).  The logic gets straight-forward to think about as
>> restart points and checkpoints always run from the checkpointer,
>> implying that pgstat_write_statsfile() is already called only from the
>> postmaster in single-user mode or the checkpointer itself, at
>> shutdown.
>>
>> Attached is a patch set, with the one being the actual feature, with
>> some stuff prior to that:
>> - 0001 adds the redo LSN to the pgstats file flushed.
>> - 0002 adds an assertion in pgstat_write_statsfile(), to check from
>> where it is called.
>> - 0003 with more debugging.
>> - 0004 is the meat of the thread.
>>
>> I am adding that to the next CF.  Thoughts and comments are welcome.
>> Thanks,
>> -- 
>> Michael
> 
> Hi Michael.
> 
> I am working mostly on the same problem - persisting pgstat state in
> Neon (because of separation of storage and compute it has no local files).
> I have two questions concerning this PR and the whole strategy for
> saving pgstat state between sessions.
> 
> 1. Right now pgstat.stat is discarded after abnormal Postgres
> termination. And in your PR we are storing LSN in pgstat.staty to check
> that it matches checkpoint redo LSN. I wonder if having outdated version
> of pgstat.stat  is worse than not having it at all? Comment in xlog.c
> says: "When starting with crash recovery, reset pgstat data - it might
> not be valid." But why it may be invalid? We are writing it first to
> temp file and then rename. May be fsync() should be added here and
> durable_rename() should be used instead of rename(). But it seems to be
> better than loosing statistics. And should not add some significant
> overhead (because it happens only at shutdown). In your case we are
> checking LSN of pgstat.stat file. But once again - why it is better to
> discard file than load version from previous checkpoint?
> 

I think those are two independent issues - knowing that the snapshot is
from the last checkpoint, and knowing that it's correct (not corrupted).
And yeah, we should be careful about fsync/durable_rename.

> 2. Second question is also related with 1). So we saved pgstat.stat on
> checkpoint, then did some updates and then Postgres crashed. As far as I
> understand with your patch we will successfully restore pgstats.stat
> file. But it is not actually up-to-date: it doesn't reflect information
> about recent updates. If it was so critical to keep pgstat.stat
> up-to-date, then why do we allow to restore state on most recent
> checkpoint?
> 

Yeah, I was wondering about the same thing - can't this mean we fail to
start autovacuum? Let's say we delete a significant fraction of a huge
table, but then we crash before the next checkpoint. With this patch we
restore the last stats snapshot, which can easily have

n_dead_tup          | 0
n_mod_since_analyze | 0

for the table. And thus we fail to notice the table needs autovacuum.
AFAIK we run autovacuum on all tables with missing stats (or am I
wrong?). That's what's happening on replicas after switchover/failover
too, right?

It'd not be such an issue if we updated stats during recovery, but I
think think we're doing that. Perhaps we should, which might also help
on replicas - no idea if it's feasible, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company