Re: refactoring basebackup.c

Поиск
Список
Период
Сортировка
От tushar
Тема Re: refactoring basebackup.c
Дата
Msg-id 6c3f1558-1e56-9946-78a2-c59340da1dbf@enterprisedb.com
обсуждение исходный текст
Ответ на Re: refactoring basebackup.c  (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
Список pgsql-hackers
On 1/18/22 8:12 PM, Jeevan Ladhe wrote:
> Similar to LZ4 server-side compression, I have also tried to add a ZSTD
> server-side compression in the attached patch.
Thanks Jeevan. while testing found one scenario where the server is 
getting crash while performing pg_basebackup
against server-compression=zstd for a huge data second time

Steps to reproduce
--PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure 
--with-lz4,--with-zstd, make/install, initdb, start server)
--insert huge data (./pgbench -i -s 2000 postgres)
--restart the server (./pg_ctl -D data restart)
--pg_basebackup ( ./pg_basebackup  -t server:/tmp/yc1 
--server-compression=zstd -R  -Xnone -n -N -l 'ccc' --no-estimate-size -v)
--insert huge data (./pgbench -i -s 1000 postgres)
--restart the server (./pg_ctl -D data restart)
--run pg_basebackup again (./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone )

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:23:26.508 IST [30219] LOG:  checkpoint starting: force wait
2022-01-19 21:23:26.608 IST [30219] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s, 
average=0.000 s; distance=16369 kB, estimate=16369 kB
pg_basebackup: checkpoint completed
TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File: 
"../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226)
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(ExceptionalCondition+0x7a)[0x94ceca]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9a08]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9be2]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b5b30]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(SendBaseBackup+0x563)[0x7b7053]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(exec_replication_command+0x961)[0x7c9a41]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostgresMain+0x92f)[0x81ca3f]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48e430]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostmasterMain+0xfd2)[0x785702]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(main+0x1c6)[0x48fb96]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48feb5]
pg_basebackup: error: could not read COPY data: server closed the 
connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
2022-01-19 21:25:34.485 IST [30205] LOG:  server process (PID 30226) was 
terminated by signal 6: Aborted
2022-01-19 21:25:34.485 IST [30205] DETAIL:  Failed process was running: 
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,  MANIFEST 
'yes',  TABLESPACE_MAP,  TARGET 'server', TARGET_DETAIL '/tmp/yc11',  
COMPRESSION 'zstd')
2022-01-19 21:25:34.485 IST [30205] LOG:  terminating any other active 
server processes
[edb@centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all 
server processes terminated; reinitializing
2022-01-19 21:25:34.536 IST [30228] LOG:  database system was 
interrupted; last known up at 2022-01-19 21:23:26 IST
2022-01-19 21:25:34.669 IST [30228] LOG:  database system was not 
properly shut down; automatic recovery in progress
2022-01-19 21:25:34.671 IST [30228] LOG:  redo starts at 9/7000028
2022-01-19 21:25:34.671 IST [30228] LOG:  invalid record length at 
9/7000148: wanted 24, got 0
2022-01-19 21:25:34.671 IST [30228] LOG:  redo done at 9/7000110 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-01-19 21:25:34.673 IST [30229] LOG:  checkpoint starting: 
end-of-recovery immediate wait
2022-01-19 21:25:34.713 IST [30229] LOG:  checkpoint complete: wrote 3 
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 
s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s, 
average=0.001 s; distance=0 kB, estimate=0 kB
2022-01-19 21:25:34.718 IST [30205] LOG:  database system is ready to 
accept connections

Observation -

if we change server-compression method to lz4 from zstd then it is NOT 
happening.

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/ycc1 
--server-compression=lz4 -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:27:51.642 IST [30229] LOG:  checkpoint starting: force wait
2022-01-19 21:27:51.687 IST [30229] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, 
average=0.000 s; distance=16383 kB, estimate=16383 kB
pg_basebackup: checkpoint completed

NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
[edb@centos7tushar bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Replace uses of deprecated Python module distutils.sysconfig
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Remove redundant MemoryContextSwith in BeginCopyFrom