Re: Making pg_rewind faster

Поиск

Список

Период

Сортировка

От	Srinath Reddy Sadipiralla
Тема	Re: Making pg_rewind faster
Дата	15 октября 17:26:56
Msg-id	CAFC+b6qGJXRffhRxL3xOyd1ceBQTXb2iVGOLsEMpOF8rYxseZA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Making pg_rewind faster (John H <johnhyvr@gmail.com>)
Ответы	Re: Making pg_rewind faster
Список	pgsql-hackers

Дерево обсуждения

Hi,

On Wed, Oct 15, 2025 at 4:49 AM John H <johnhyvr@gmail.com> wrote:

Hi,

On Fri, Oct 10, 2025 at 12:45 AM Srinath Reddy Sadipiralla
<srinath2133@gmail.com> wrote:
>
> ...
> XLogFilePath , then validate this new path with the given path
> ,this helps to catch invalid xlog files like pg_wal/00000001FFFFFFFFFFFFFF10.
> ...

Are you concerned that somehow these files, which are named like XLog
files but actually
aren't, are somehow created therefore we should sync them in case?
I'm trying to understand how these files would be generated in the first place.

the problem is not when server generates them because
filename is properly calculated using the XLogRecPtr in
XLogWrite->XLogFileInit->XLogFileInitInternal->XLogFilePath
,the main problem is when if someone manually places an invalid WAL file
in pg_wal like 00000001FFFFFFFFFFFFFF10, IsXLogFileName will
consider it as valid ,so with the approach as i mentioned earlier we can
catch such cases.

>
> instead of passing last_common_segno down the call stack directly,
> we can have a struct maybe called "rewindState" which will have the common
> information related to both clusters involved in rewind ,so that in future
> if we want to pass more such information down we won't be increasing
> the number of arguments that need to be passed down to all functions in stack.
>

I don't feel too strongly about this. Not sure how we view
future-proofing things,
as in do we proactively wrap around structs or wait until there's an
actual need.

This is the first time it's been touched since it was introduced in 14 for what
it's worth.

TBH first it seemed to me a good coding practice for future proofing,
then i checked if there's any place in postgres we are doing
something similar ,then found 1 in src/include/access/gist.h

typedef struct GISTDeletedPageContents
{
/* last xid which could see the page in a scan */
FullTransactionId deleteXid;
} GISTDeletedPageContents;

>
> we can improve the TAP test file header comment as
> # Test the situation where we skip copying the same WAL files from source to target
> # except if WAL file size differs.
>
> let me put it this way
> 1) WALs are written to 000000010000000000000002 in primary.
> ...
> 7) so the last common WAL segment was 000000010000000000000003.

Updated comments.

>
> ...
> where entry->target_size != entry->source_size and we do copy,
> like if stat->mtime is changed (which means file has been copied)
> and target_stat->size != soucrce_stat->size here no error expected.
>

Updated patch with one additional segment has a byte appended to it
on target but not source.

thanks for updating, i have attached a diff patch on
top of v9-0001 patch , where i tried to add more tests
to improve the validation that we copy the WAL file even
when it exists on both source and target but the size differs.
I also tried to rename some variables for readability,while
testing this out i found that usleep(1000); is not enough
to show that there has been a copy which changed the
stat->mtime of the file because i think the copy was very
fast (less than 1 millisecond) so the mtime of the
file after rewind's copy didn't change or filesystem precision
didn't caught the change, the copying of the file is confirmed
because the same file has different size before rewind and
same after rewind which these tests proved

ok($corrupt_wal_seg_stat_before_rewind->size != $corrupt_wal_seg_source_stat->size,"WAL segment $corrupt_wal_seg has different size in source vs target before rewind");
ok 11 - WAL segment 000000010000000000000004 has different size in source vs target before rewind

ok($corrupt_wal_seg_stat_after_rewind->size == $corrupt_wal_seg_source_stat->size, "WAL segment $corrupt_wal_seg was copied: file sizes are same between target and source after rewind");
ok 12 - WAL segment 000000010000000000000004 was copied: file sizes are same between target and source after rewind

and more over the pg_rewind confirmed this with a new debug
message which I added in this diff patch

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 69d7728ce44..743484f4833 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -732,11 +732,11 @@ decide_wal_file_action(const char *fname, XLogSegNo last_common_segno,
*/
if (file_segno < last_common_segno && source_size == target_size)
{
pg_log_debug("WAL segment \"%s\" not copied to target", fname);
return FILE_ACTION_NONE;
}

+ pg_log_debug("WAL segment \"%s\" is copied to target", fname);
return FILE_ACTION_COPY;
}
ok 6 - run pg_rewind stderr /(?^:WAL segment \"000000010000000000000004\" is copied to target)/

also did this instead of using catfile
-my $wal_skipped_path = File::Spec->catfile($node_primary->data_dir, 'pg_wal', $wal_seg_skipped);
+my $wal_skipped_path = $node_primary->data_dir . '/pg_wal/' . $wal_seg_skipped;

seems consistent with other TAP tests.

Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Вложения

v9-0001.diff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Making pg_rewind faster

Вложения