Обсуждение: Server crash with Master-Slave configuration.

Поиск

Список

Период

Сортировка

Server crash with Master-Slave configuration.

От

Prabhat Sahu

Дата:

24 декабря 2019 г., 14:59:25

Hi,

While performing below operations with Master-Slave configuration, Slave is crashed.
Below are the steps to reproduce:

-- create a Slave using pg_basebackup and start:
./pg_basebackup -v -R -D d2 -p 55510
mkdir /home/centos/ts1

-- Session 1(Master):
./psql postgres -p 55510

CREATE TABLESPACE ts1 location '/home/centos/ts1';
CREATE TABLE tab1 (c1 INTEGER, c2 TEXT, c3 point) tablespace ts1;
insert into tab1 (select x, x||'_c2',point (x,x) from generate_series(1,100000) x);

-- Cancel the below update query in middle and then vacuum:
update tab1 set c1=c1+2 , c3=point(10,10) where c1 <=90000;
vacuum(analyze) tab1(c3, c2);

postgres=# update tab1 set c1=c1+2 , c3=point(10,10) where c1 <=90000;
^CCancel request sent
ERROR: canceling statement due to user request

postgres=# vacuum(analyze) tab1(c3, c2);
VACUUM

OR

postgres=# vacuum(analyze) tab1(c3, c2);
ERROR: index "pg_toast_16385_index" contains unexpected zero page at block 0
HINT: Please REINDEX it.

-- session 2: (slave)
./psql postgres -p 55520

-- Below select query is crashed:
select count(*) from tab1_2;

postgres=# select count(*) from tab1_2;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

-- Below is the stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c d2/core.20509 postgres
Reading symbols from /home/centos/PGsrc/postgresql/inst/bin/postgres...done.
[New LWP 20509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: startup recovering 000000010000000000000006 '.
Program terminated with signal 6, Aborted.
#0 0x00007f42d2565337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f42d2565337 in raise () from /lib64/libc.so.6
#1 0x00007f42d2566a28 in abort () from /lib64/libc.so.6
#2 0x0000000000a94c55 in errfinish (dummy=0) at elog.c:590
#3 0x0000000000a9729a in elog_finish (elevel=22, fmt=0xb30a10 "WAL contains references to invalid pages") at elog.c:1465
#4 0x000000000057cb10 in log_invalid_page (node=..., forkno=MAIN_FORKNUM, blkno=470, present=false) at xlogutils.c:96
#5 0x000000000057d64e in XLogReadBufferExtended (rnode=..., forknum=MAIN_FORKNUM, blkno=470, mode=RBM_NORMAL) at xlogutils.c:472
#6 0x000000000057d386 in XLogReadBufferForRedoExtended (record=0x1b4a9c8, block_id=0 '\000', mode=RBM_NORMAL, get_cleanup_lock=true, buf=0x7ffda55b39d4)
at xlogutils.c:390
#7 0x00000000004f12b5 in heap_xlog_clean (record=0x1b4a9c8) at heapam.c:7744
#8 0x00000000004f4ebe in heap2_redo (record=0x1b4a9c8) at heapam.c:8891
#9 0x000000000056cceb in StartupXLOG () at xlog.c:7202
#10 0x000000000086cb0c in StartupProcessMain () at startup.c:170
#11 0x0000000000582150 in AuxiliaryProcessMain (argc=2, argv=0x7ffda55b4600) at bootstrap.c:451
#12 0x000000000086ba0f in StartChildProcess (type=StartupProcess) at postmaster.c:5461
#13 0x000000000086685d in PostmasterMain (argc=5, argv=0x1b49d50) at postmaster.c:1392
#14 0x0000000000775bb1 in main (argc=5, argv=0x1b49d50) at main.c:210
(gdb)

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: Server crash with Master-Slave configuration.

От

Michael Paquier

Дата:

25 декабря 2019 г., 05:31:27

On Tue, Dec 24, 2019 at 05:29:25PM +0530, Prabhat Sahu wrote:
> While performing below operations with Master-Slave configuration, Slave is
> crashed.
> Below are the steps to reproduce:
>
> -- create a Slave using pg_basebackup and start:
> ./pg_basebackup -v -R -D d2 -p 55510
> mkdir /home/centos/ts1
>
> -- Session 1(Master):
> ./psql postgres -p 55510
>
> CREATE TABLESPACE ts1 location '/home/centos/ts1';

Your mistake is here.  Both primary and standby are on the same host,
so CREATE TABLESPACE would point to a path that overlap for both
clusters as the tablespace path is registered the WAL replayed,
leading to various weird behaviors.  What you need to do instead is to
create the tablespace before taking the base backup, and then take the
base backup using pg_basebackup's --tablespace-mapping.
--
Michael

Вложения

signature.asc

Re: Server crash with Master-Slave configuration.

От

Prabhat Sahu

Дата:

25 декабря 2019 г., 09:28:45

On Wed, Dec 25, 2019 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 24, 2019 at 05:29:25PM +0530, Prabhat Sahu wrote:
> While performing below operations with Master-Slave configuration, Slave is
> crashed.
> Below are the steps to reproduce:
>
> -- create a Slave using pg_basebackup and start:
> ./pg_basebackup -v -R -D d2 -p 55510
> mkdir /home/centos/ts1
>
> -- Session 1(Master):
> ./psql postgres -p 55510
>
> CREATE TABLESPACE ts1 location '/home/centos/ts1';

Your mistake is here. Both primary and standby are on the same host,
so CREATE TABLESPACE would point to a path that overlap for both
clusters as the tablespace path is registered the WAL replayed,
leading to various weird behaviors. What you need to do instead is to
create the tablespace before taking the base backup, and then take the
base backup using pg_basebackup's --tablespace-mapping.

Thanks Michael for pointing it out, I have re-tested the scenario

with "--tablespace-mapping=OLDDIR=NEWDIR" option of pg_basebackup, and now its working fine.

But I think, instead of the crash, a proper error message would be better.

--
Michael

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: Server crash with Master-Slave configuration.

От

Robert Haas

Дата:

28 декабря 2019 г., 05:46:09

On Wed, Dec 25, 2019 at 1:29 AM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:

Thanks Michael for pointing it out, I have re-tested the scenario
with "--tablespace-mapping=OLDDIR=NEWDIR" option of pg_basebackup, and now its working fine.
But I think, instead of the crash, a proper error message would be better.

It appears from the stack trace you sent that it emits a PANIC, which seems like a proper error message to me.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Server crash with Master-Slave configuration.

Server crash with Master-Slave configuration.

Re: Server crash with Master-Slave configuration.

Вложения

Re: Server crash with Master-Slave configuration.

Re: Server crash with Master-Slave configuration.