Обсуждение: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

Поиск
Список
Период
Сортировка

[PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:

Greetings,

 

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

 

Here are the steps to repro on Linux (as Windows repro is similar):

 

1.       Get latest PostgreSQL code, build and install the executables.

 

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

 

2.       Run initdb to initialize a PG database folder.

 

$ /path/to/postgres/bin/initdb -D /path/to/data

 

3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

 

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

 

4.       Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

 

Do a checkpoint explicitly:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;

 

2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

 

Or directly give an EOF:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation base/12368/1247

 

5.       Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation base/12368/1247

 

Analyses:

 

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let’s first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

 

void

AbortBufferIO(void)

{

    BufferDesc *buf = InProgressBuf;

 

    if (buf)

    {

        uint32      buf_state;

 

        /*

         * Since LWLockReleaseAll has already been called, we're not holding

         * the buffer's io_in_progress_lock. We have to re-acquire it so that

         * we can use TerminateBufferIO. Anyone who's executing WaitIO on the

         * buffer will be in a busy spin until we succeed in doing this.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

 

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

    uint32      buf_state;

 

    Assert(!InProgressBuf);

 

    for (;;)

    {

        /*

         * Grab the io_in_progress lock so that other processes can wait for

         * me to finish the I/O.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

        buf_state = LockBufHdr(buf);

 

        if (!(buf_state & BM_IO_IN_PROGRESS))

            break;

 

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

    at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

 

The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4  0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

    dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

    at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

 

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

 

  1. Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.
  2. Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it’s only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.
  3. Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

 

In addition, we have done multiple experiments to confirm these cases.

 

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

 

Fix proposal:

 

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

                 * way, start up the XLOG machinery, and register to have it closed

                 * down at exit.

                 */

+                on_shmem_exit(ReleaseLWLocks, 0);

                StartupXLOG();

                on_shmem_exit(ShutdownXLOG, 0);

        }

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

        heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+        LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via

 

The fix proposal is also attached to this email in file “fix-deadlock.patch”.

 

Please let us know should you have suggestions on this issue and the fix.

 

Thank you!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

Вложения

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Jesse Zhang
Дата:
Hey Chengyu,
How did you set up your GDB to get "p" to pretty-print a Postgres list?

Cheers,
Jesse

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu <chengyu@microsoft.com> wrote:

Greetings,

 

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

 

Here are the steps to repro on Linux (as Windows repro is similar):

 

1.       Get latest PostgreSQL code, build and install the executables.

 

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

 

2.       Run initdb to initialize a PG database folder.

 

$ /path/to/postgres/bin/initdb -D /path/to/data

 

3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

 

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

 

4.       Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

 

Do a checkpoint explicitly:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;

 

2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

 

Or directly give an EOF:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation base/12368/1247

 

5.       Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation base/12368/1247

 

Analyses:

 

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let’s first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

 

void

AbortBufferIO(void)

{

    BufferDesc *buf = InProgressBuf;

 

    if (buf)

    {

        uint32      buf_state;

 

        /*

         * Since LWLockReleaseAll has already been called, we're not holding

         * the buffer's io_in_progress_lock. We have to re-acquire it so that

         * we can use TerminateBufferIO. Anyone who's executing WaitIO on the

         * buffer will be in a busy spin until we succeed in doing this.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

 

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

    uint32      buf_state;

 

    Assert(!InProgressBuf);

 

    for (;;)

    {

        /*

         * Grab the io_in_progress lock so that other processes can wait for

         * me to finish the I/O.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

        buf_state = LockBufHdr(buf);

 

        if (!(buf_state & BM_IO_IN_PROGRESS))

            break;

 

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

    at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

 

The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4  0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

    dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

    at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

 

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

 

  1. Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.
  2. Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it’s only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.
  3. Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

 

In addition, we have done multiple experiments to confirm these cases.

 

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

 

Fix proposal:

 

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

                 * way, start up the XLOG machinery, and register to have it closed

                 * down at exit.

                 */

+                on_shmem_exit(ReleaseLWLocks, 0);

                StartupXLOG();

                on_shmem_exit(ShutdownXLOG, 0);

        }

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

        heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+        LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via

 

The fix proposal is also attached to this email in file “fix-deadlock.patch”.

 

Please let us know should you have suggestions on this issue and the fix.

 

Thank you!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:

Hi Jesse,

 

Actually, before/on_shmem_exit_list are both arrays, they are of type “struct ONEXIT [20]”.

For the pretty-printing, I just edited it manually to make it easier for the reader to look up.

But since you have questioned, there does exit a GDB command of “set print array on”. After that “p” command will print array entry one per line.

 

Thanks,

Chengchao

 

From: Jesse Zhang <sbjesse@gmail.com>
Sent: Saturday, December 1, 2018 9:55 AM
To: Chengchao Yu <chengyu@microsoft.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

 

Hey Chengyu,

How did you set up your GDB to get "p" to pretty-print a Postgres list?

 

Cheers,

Jesse

 

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu <chengyu@microsoft.com> wrote:

Greetings,

 

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

 

Here are the steps to repro on Linux (as Windows repro is similar):

 

1.       Get latest PostgreSQL code, build and install the executables.

 

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

 

2.       Run initdb to initialize a PG database folder.

 

$ /path/to/postgres/bin/initdb -D /path/to/data

 

3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

 

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

 

4.       Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

 

Do a checkpoint explicitly:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;

 

2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

 

Or directly give an EOF:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation base/12368/1247

 

5.       Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation base/12368/1247

 

Analyses:

 

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let’s first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

 

void

AbortBufferIO(void)

{

    BufferDesc *buf = InProgressBuf;

 

    if (buf)

    {

        uint32      buf_state;

 

        /*

         * Since LWLockReleaseAll has already been called, we're not holding

         * the buffer's io_in_progress_lock. We have to re-acquire it so that

         * we can use TerminateBufferIO. Anyone who's executing WaitIO on the

         * buffer will be in a busy spin until we succeed in doing this.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

 

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

    uint32      buf_state;

 

    Assert(!InProgressBuf);

 

    for (;;)

    {

        /*

         * Grab the io_in_progress lock so that other processes can wait for

         * me to finish the I/O.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

        buf_state = LockBufHdr(buf);

 

        if (!(buf_state & BM_IO_IN_PROGRESS))

            break;

 

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

    at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

 

The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4  0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

    dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

    at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

 

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

 

a.       Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.

b.       Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it’s only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.

c.       Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

 

In addition, we have done multiple experiments to confirm these cases.

 

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

 

Fix proposal:

 

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

                 * way, start up the XLOG machinery, and register to have it closed

                 * down at exit.

                 */

+                on_shmem_exit(ReleaseLWLocks, 0);

                StartupXLOG();

                on_shmem_exit(ShutdownXLOG, 0);

        }

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

        heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+        LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via

 

The fix proposal is also attached to this email in file “fix-deadlock.patch”.

 

Please let us know should you have suggestions on this issue and the fix.

 

Thank you!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:

Greetings,

 

Just would like to follow up this issue and fix proposal. We really would like to have this issue fixed in PG. Could someone give some suggestions to the fix proposal? Or other ideas to fix this issue?

 

Looking forward for your feedbacks!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' <pgsql-hackers@postgresql.org>
Cc: Prabhat Tripathi <ptrip@microsoft.com>; Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

 

Greetings,

 

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

 

Here are the steps to repro on Linux (as Windows repro is similar):

 

1.       Get latest PostgreSQL code, build and install the executables.

 

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

 

2.       Run initdb to initialize a PG database folder.

 

$ /path/to/postgres/bin/initdb -D /path/to/data

 

3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

 

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

 

4.       Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

 

Do a checkpoint explicitly:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;

 

2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

 

Or directly give an EOF:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation base/12368/1247

 

5.       Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation base/12368/1247

 

Analyses:

 

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let’s first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

 

void

AbortBufferIO(void)

{

    BufferDesc *buf = InProgressBuf;

 

    if (buf)

    {

        uint32      buf_state;

 

        /*

         * Since LWLockReleaseAll has already been called, we're not holding

         * the buffer's io_in_progress_lock. We have to re-acquire it so that

         * we can use TerminateBufferIO. Anyone who's executing WaitIO on the

         * buffer will be in a busy spin until we succeed in doing this.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

 

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

    uint32      buf_state;

 

    Assert(!InProgressBuf);

 

    for (;;)

    {

        /*

         * Grab the io_in_progress lock so that other processes can wait for

         * me to finish the I/O.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

        buf_state = LockBufHdr(buf);

 

        if (!(buf_state & BM_IO_IN_PROGRESS))

            break;

 

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

    at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

 

The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4  0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

    dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

    at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

 

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

 

  1. Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.
  2. Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it’s only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.
  3. Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

 

In addition, we have done multiple experiments to confirm these cases.

 

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

 

Fix proposal:

 

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

                 * way, start up the XLOG machinery, and register to have it closed

                 * down at exit.

                 */

+                on_shmem_exit(ReleaseLWLocks, 0);

                StartupXLOG();

                on_shmem_exit(ShutdownXLOG, 0);

        }

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

        heap_close(relsetting, AccessShareLock);

}

 

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+        LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via

 

The fix proposal is also attached to this email in file “fix-deadlock.patch”.

 

Please let us know should you have suggestions on this issue and the fix.

 

Thank you!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

Вложения

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:

Greetings,

 

Happy new year!

 

We would like to follow up again for this issue and fix proposal. Could someone give some suggestions to the fix proposal? Or other ideas to fix this issue?

 

Looking forward to your feedbacks!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

 

From: Chengchao Yu <chengyu@microsoft.com>
Sent: Wednesday, December 19, 2018 2:51 PM
To: pgsql-hackers@postgresql.org
Cc: Prabhat Tripathi <ptrip@microsoft.com>; Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>; Bhavin Gandhi <bhaving@microsoft.com>
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

 

Greetings,

 

Just would like to follow up this issue and fix proposal. We really would like to have this issue fixed in PG. Could someone give some suggestions to the fix proposal? Or other ideas to fix this issue?

 

Looking forward for your feedbacks!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' <pgsql-hackers@postgresql.org>
Cc: Prabhat Tripathi <ptrip@microsoft.com>; Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

 

Greetings,

 

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

 

Here are the steps to repro on Linux (as Windows repro is similar):

 

1.       Get latest PostgreSQL code, build and install the executables.

 

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

 

2.       Run initdb to initialize a PG database folder.

 

$ /path/to/postgres/bin/initdb -D /path/to/data

 

3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

 

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

 

4.       Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

 

Do a checkpoint explicitly:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;

 

2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation base/12368/1247

 

Or directly give an EOF:

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

> create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF

 

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation base/12368/1247

 

5.       Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

 

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation base/12368/1247

 

Analyses:

 

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let’s first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

 

void

AbortBufferIO(void)

{

    BufferDesc *buf = InProgressBuf;

 

    if (buf)

    {

        uint32      buf_state;

 

        /*

         * Since LWLockReleaseAll has already been called, we're not holding

         * the buffer's io_in_progress_lock. We have to re-acquire it so that

         * we can use TerminateBufferIO. Anyone who's executing WaitIO on the

         * buffer will be in a busy spin until we succeed in doing this.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

 

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

    uint32      buf_state;

 

    Assert(!InProgressBuf);

 

    for (;;)

    {

        /*

         * Grab the io_in_progress lock so that other processes can wait for

         * me to finish the I/O.

         */

        LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

 

        buf_state = LockBufHdr(buf);

 

        if (!(buf_state & BM_IO_IN_PROGRESS))

            break;

 

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

    at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

 

The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

 

Here is the call stack:

 

(gdb) bt

#0  0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1  do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2  0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3  0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4  0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5  0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6  0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7  0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8  0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9  0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

    dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

    at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

    completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

 

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

 

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

 

  1. Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.
  2. Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it’s only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.
  3. Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

 

In addition, we have done multiple experiments to confirm these cases.

 

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

 

Fix proposal:

 

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

 

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

                 * way, start up the XLOG machinery, and register to have it closed

                 * down at exit.

                 */

+                on_shmem_exit(ReleaseLWLocks, 0);

                StartupXLOG();

                on_shmem_exit(ShutdownXLOG, 0);

        }

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

        heap_close(relsetting, AccessShareLock);

}

 

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+        LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via

 

The fix proposal is also attached to this email in file “fix-deadlock.patch”.

 

Please let us know should you have suggestions on this issue and the fix.

 

Thank you!

 

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

 

Вложения

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Amit Kapila
Дата:
On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu <chengyu@microsoft.com> wrote:
>
>
> Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures)
happensin single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions. 
>
>
> 3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is
turnedoff, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to
writedata to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any
more.
>
>

I have one question related to the way you have tried to emulate the error.

@@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum,
nbytes,
BLCKSZ);
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("Emulate exception in mdwrite() when writing to disk")));
+

We generally reserve the space in a relation before attempting to
write, so not sure how you are able to hit the disk full situation via
mdwrite.  If you see the description of the function, that also
indicates same.

/*
 * mdwrite() -- Write the supplied block at the appropriate location.
 *
 * This is to be used only for updating already-existing blocks of a
 * relation (ie, those before the current EOF).  To extend a relation,
 * use mdextend().
 */

I am not telling that mdwrite can never lead to error, but just trying
to understand the issue you actually faced.  I haven't read your
proposed solution yet, let's first try to establish the problem you
are facing.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Thomas Munro
Дата:
On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu <chengyu@microsoft.com> wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO
failures)happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres
versions.
> >
> > 3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is
turnedoff, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to
writedata to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any
more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum,
> BlockNumber blocknum,
> nbytes,
> BLCKSZ);
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("Emulate exception in mdwrite() when writing to disk")));
> +
>
> We generally reserve the space in a relation before attempting to
> write, so not sure how you are able to hit the disk full situation via
> mdwrite.  If you see the description of the function, that also
> indicates same.

Presumably ZFS or BTRFS or something more exotic could still get
ENOSPC here, and of course any filesystem could give us EIO here
(because the disk is on fire or the remote NFS server is rebooting due
to an automatic Windows update).

--
Thomas Munro
http://www.enterprisedb.com


RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:
Hi Amit, Thomas,

Thank you very much for your feedbacks! Apologizes but I just saw both messages.

> We generally reserve the space in a relation before attempting to write, so not sure how you are able to hit the disk
fullsituation via mdwrite.  If you see the description of the function, that also indicates same.
 

Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due to our own storage layer which treat
mdextend()actions as setting length of the file only. We have a workaround, and any change isn’t needed for Postgres.
 

> I am not telling that mdwrite can never lead to error, but just trying to understand the issue you actually faced.  I
haven'tread your proposed solution yet, let's first try to establish the problem you are facing.
 

We see transient IO errors reading a block where PG instance gets dead-lock in single user mode until we kill the
instance.The stack trace below shows the behavior which is when mdread() failed with buffer holding its lw-lock. This
happensbecause in single user mode there is no call back to release the lock and when AbortBufferIO() tries to acquire
thesame lock again, it will wait for the lock indefinitely.
 

Here is the stack trace:

0a 00000004`8080cc30 00000004`80dcf917 postgres!PGSemaphoreLock+0x65
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c@ 158] 
 
0b 00000004`8080cc90 00000004`80db025c postgres!LWLockAcquire+0x137
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c@ 1234] 
 
0c 00000004`8080ccd0 00000004`80db25db postgres!AbortBufferIO+0x2c
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 3995] 
 
0d 00000004`8080cd20 00000004`80dbce36 postgres!AtProcExit_Buffers+0xb
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 2479] 
 
0e 00000004`8080cd50 00000004`80dbd1bd postgres!shmem_exit+0xf6 [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @
262]
 
0f 00000004`8080cd80 00000004`80dbccfd postgres!proc_exit_prepare+0x4d
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c@ 188] 
 
10 00000004`8080cdb0 00000004`80ef9e74 postgres!proc_exit+0xd [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @
141]
 
11 00000004`8080cde0 00000004`80ddb6ef postgres!errfinish+0x204 [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c
@624] 
 
12 00000004`8080ce50 00000004`80db0f59 postgres!mdread+0x12f [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @
806]
 
13 00000004`8080cea0 00000004`80daeb70 postgres!ReadBuffer_common+0x2c9
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 897] 
 
14 00000004`8080cf30 00000004`80b81322 postgres!ReadBufferWithoutRelcache+0x60
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 694] 
 
15 00000004`8080cf90 00000004`80db9cbb postgres!XLogReadBufferExtended+0x142
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlogutils.c@ 513] 
 
16 00000004`8080cff0 00000004`80b2f53a postgres!XLogRecordPageWithFreeSpace+0xbb
[d:\orcasqlagsea10\14\s\src\backend\storage\freespace\freespace.c@ 254] 
 
17 00000004`8080d030 00000004`80b6eb94 postgres!heap_xlog_insert+0x36a
[d:\orcasqlagsea10\14\s\src\backend\access\heap\heapam.c@ 8491] 
 
18 00000004`8080f0d0 00000004`80f0a13f postgres!StartupXLOG+0x1f84
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlog.c@ 7480] 
 
19 00000004`8080fbf0 00000004`80de121e postgres!InitPostgres+0x12f
[d:\orcasqlagsea10\14\s\src\backend\utils\init\postinit.c@ 656] 
 
1a 00000004`8080fcd0 00000004`80c92c31 postgres!PostgresMain+0x25e [d:\orcasqlagsea10\14\s\src\backend\tcop\postgres.c
@3881] 
 
1b 00000004`8080fed0 00000004`80f51df3 postgres!main+0x491 [d:\orcasqlagsea10\14\s\src\backend\main\main.c @ 235] 

Please let us know should you have more feedbacks. Thank you!
 
Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-----Original Message-----
From: Thomas Munro <thomas.munro@enterprisedb.com> 
Sent: Thursday, January 24, 2019 2:32 PM
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: Chengchao Yu <chengyu@microsoft.com>; Pg Hackers <pgsql-hackers@postgresql.org>; Prabhat Tripathi
<ptrip@microsoft.com>;Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>
 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu <chengyu@microsoft.com> wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO
failures)happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres
versions.
> >
> > 3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is
turnedoff, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to
writedata to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any
more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, 
> BlockNumber blocknum, nbytes, BLCKSZ);
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("Emulate exception in mdwrite() when writing to disk")));
> +
>
> We generally reserve the space in a relation before attempting to 
> write, so not sure how you are able to hit the disk full situation via 
> mdwrite.  If you see the description of the function, that also 
> indicates same.

Presumably ZFS or BTRFS or something more exotic could still get ENOSPC here, and of course any filesystem could give
usEIO here (because the disk is on fire or the remote NFS server is rebooting due to an automatic Windows update).
 

--
Thomas Munro

https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&data=02%7C01%7Cchengyu%40microsoft.com%7C58d8ab2c88044c1a78ef08d6824bcfdd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636839659503024691&sdata=tZCFSt7zMWO%2BbIITkggTuTsxu7JpKTYG1UYIwZE8XEc%3D&reserved=0

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Amit Kapila
Дата:
On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu <chengyu@microsoft.com> wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both messages.
>
> > We generally reserve the space in a relation before attempting to write, so not sure how you are able to hit the
diskfull situation via mdwrite.  If you see the description of the function, that also indicates same. 
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due to our own storage layer which treat
mdextend()actions as setting length of the file only. We have a workaround, and any change isn’t needed for Postgres. 
>
> > I am not telling that mdwrite can never lead to error, but just trying to understand the issue you actually faced.
Ihaven't read your proposed solution yet, let's first try to establish the problem you are facing. 
>
> We see transient IO errors reading a block where PG instance gets dead-lock in single user mode until we kill the
instance.The stack trace below shows the behavior which is when mdread() failed with buffer holding its lw-lock. This
happensbecause in single user mode there is no call back to release the lock and when AbortBufferIO() tries to acquire
thesame lock again, it will wait for the lock indefinitely. 
>

I think you can register your patch for next CF [1] so that we don't
forget about it.

[1] - https://commitfest.postgresql.org/22/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:
Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-----Original Message-----
From: Amit Kapila <amit.kapila16@gmail.com> 
Sent: Friday, February 1, 2019 6:58 PM
To: Chengchao Yu <chengyu@microsoft.com>
Cc: Thomas Munro <thomas.munro@enterprisedb.com>; Pg Hackers <pgsql-hackers@postgresql.org>; Prabhat Tripathi
<ptrip@microsoft.com>;Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>; TEJA Mupparti
<Tejeswar.Mupparti@microsoft.com>
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu <chengyu@microsoft.com> wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both messages.
>
> > We generally reserve the space in a relation before attempting to write, so not sure how you are able to hit the
diskfull situation via mdwrite.  If you see the description of the function, that also indicates same.
 
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due to our own storage layer which treat
mdextend()actions as setting length of the file only. We have a workaround, and any change isn’t needed for Postgres.
 
>
> > I am not telling that mdwrite can never lead to error, but just trying to understand the issue you actually faced.
Ihaven't read your proposed solution yet, let's first try to establish the problem you are facing.
 
>
> We see transient IO errors reading a block where PG instance gets dead-lock in single user mode until we kill the
instance.The stack trace below shows the behavior which is when mdread() failed with buffer holding its lw-lock. This
happensbecause in single user mode there is no call back to release the lock and when AbortBufferIO() tries to acquire
thesame lock again, it will wait for the lock indefinitely.
 
>

I think you can register your patch for next CF [1] so that we don't forget about it.

[1] -
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F22%2F&data=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307&sdata=lJ2LjRgo%2Bd6ViKqwJ040BPzicOTFtFO8NmmVft00yKY%3D&reserved=0

--
With Regards,
Amit Kapila.
EnterpriseDB:
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&data=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307&sdata=nXcVn6B1fl6b5iiDKybl3zf0fXo22%2BrZ1Ne7v1%2FM5DE%3D&reserved=0

Вложения

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:
Hi Amit,

Greetings! Thank you so much for your previous feedbacks!

It seems the old patch is broken with latest master branch. So, I have rebased the patch, now it can be applied to the
latestmaster without conflicts.
 

Btw, since the commitfest was created: https://commitfest.postgresql.org/22/2003/,
were there some places that could be improved? Could you give some suggestions? Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/

-----Original Message-----
From: Chengchao Yu 
Sent: Monday, February 18, 2019 6:08 PM
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: Thomas Munro <thomas.munro@enterprisedb.com>; Pg Hackers <pgsql-hackers@postgresql.org>; Prabhat Tripathi
<ptrip@microsoft.com>;Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>; TEJA Mupparti
<Tejeswar.Mupparti@microsoft.com>
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL https://azure.microsoft.com/en-us/services/postgresql/

Вложения

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Kyotaro Horiguchi
Дата:
Sorry in advance for link-breaking message force by gmail..


https://www.postgresql.org/message-id/flat/CY4PR2101MB0804CE9836E582C0702214E8AAD30@CY4PR2101MB0804.namprd21.prod.outlook.com

I assume that we are in a consensus about the problem we are to  fix
here.

> 0a 00000004`8080cc30 00000004`80dcf917 postgres!PGSemaphoreLock+0x65
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c@ 158]
 
> 0b 00000004`8080cc90 00000004`80db025c postgres!LWLockAcquire+0x137
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c@ 1234]
 
> 0c 00000004`8080ccd0 00000004`80db25db postgres!AbortBufferIO+0x2c
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 3995]
 
> 0d 00000004`8080cd20 00000004`80dbce36 postgres!AtProcExit_Buffers+0xb
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 2479]
 
> 0e 00000004`8080cd50 00000004`80dbd1bd postgres!shmem_exit+0xf6 [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c
@262]
 
> 0f 00000004`8080cd80 00000004`80dbccfd postgres!proc_exit_prepare+0x4d
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c@ 188]
 
> 10 00000004`8080cdb0 00000004`80ef9e74 postgres!proc_exit+0xd [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @
141]
> 11 00000004`8080cde0 00000004`80ddb6ef postgres!errfinish+0x204
[d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c@ 624]
 
> 12 00000004`8080ce50 00000004`80db0f59 postgres!mdread+0x12f [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @
806]

Ok, we are fixing this. The proposed patch lets LWLockReleaseAll()
called before InitBufferPoolBackend() by registering the former after
the latter into on_shmem_exit list. Even if it works, I think it's
neither clean nor safe to register multiple order-sensitive callbacks.

AtProcExit_Buffers has the following comment:

> * During backend exit, ensure that we released all shared-buffer locks and
> * assert that we have no remaining pins.

And the only caller of it is shmem_exit. More of that, all other
caller sites calls LWLockReleaseAll() just before calling it. If
that's the case, why don't we just release all LWLocks in shmem_exit
or in AtProcExit_Buffers before calling AbortBufferIO()? I think it's
sufficient that AtProcExit_Buffers calls it at the beginning. (The
comment for the funcgtion needs editing).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Chengchao Yu
Дата:
Hi Kyotaro,

Thank you so much for your valued feedback and suggestions!

> I assume that we are in a consensus about the problem we are to  fix here.
> 
> > 0a 00000004`8080cc30 00000004`80dcf917 postgres!PGSemaphoreLock+0x65 
> > [d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158] 0b 
> > 00000004`8080cc90 00000004`80db025c postgres!LWLockAcquire+0x137 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234] 0c 
> > 00000004`8080ccd0 00000004`80db25db postgres!AbortBufferIO+0x2c 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995] 0d 
> > 00000004`8080cd20 00000004`80dbce36 postgres!AtProcExit_Buffers+0xb 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479] 0e 
> > 00000004`8080cd50 00000004`80dbd1bd postgres!shmem_exit+0xf6 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262] 0f 
> > 00000004`8080cd80 00000004`80dbccfd postgres!proc_exit_prepare+0x4d 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188]
> > 10 00000004`8080cdb0 00000004`80ef9e74 postgres!proc_exit+0xd 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141]
> > 11 00000004`8080cde0 00000004`80ddb6ef postgres!errfinish+0x204 
> > [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624]
> > 12 00000004`8080ce50 00000004`80db0f59 postgres!mdread+0x12f 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806]

Yes, this is one of the two situations we want to fix. The other situation is a cascade exception case like following.

  #0  0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0,
futex_word=0x7f0fd14c81b8)at ../sysdeps/unix/sysv/linux/futex-internal.h:205
 
  #1  do_futex_wait (sem=sem(at)entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111
  #2  0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181
  #3  0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316
  #4  0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at
/path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243
  #5  0x00005630d47cd087 in AbortBufferIO () at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988
  #6  0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473
  #7  0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272
  #8  0x00005630d47dba5e in proc_exit_prepare (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194
  #9  0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107
  #10 0x00005630d49811bc in errfinish (dummy=0) at
/path/to/postgres/source/build/../src/backend/utils/error/elog.c:541
  #11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "",
skipFsync=false)at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843
 
  #12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "",
skipFsync=false)at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650
 
  #13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751
  #14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394
  #15 0x00005630d47cab00 in BufferSync (flags=6) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984
  #16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578
  #17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149
  #18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937
  #19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723
  #20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0
"chengyu",useroid=0, out_dbname=0x0, override_allow_connections=false)
 
      at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636
  #21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres",
username=0x5630d653d7d0"chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810
 
  #22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at
/path/to/postgres/source/build/../src/backend/main/main.c:224
Though ENOSPC is avoided by reservation in PG, the other error code could be returned from OS to form this stack.

> Ok, we are fixing this. The proposed patch lets LWLockReleaseAll() called before
> InitBufferPoolBackend() by registering the former after the latter into on_shmem_exit
> list. Even if it works, I think it's neither clean nor safe to register multiple
> order-sensitive callbacks.

Actually I think the order of callbacks retains the order of how components got initialized. In the patch v2, the
specificlocation requirement was for the cascade exception to work as well.
 
However, I think we can discuss about this as we just would like to ensure lw-locks are released before
AbortBufferIO().

> And the only caller of it is shmem_exit. More of that, all other caller sites calls
> LWLockReleaseAll() just before calling it. If that's the case, why don't we just release
> all LWLocks in shmem_exit or in AtProcExit_Buffers before calling AbortBufferIO()? I think
> it's sufficient that AtProcExit_Buffers calls it at the beginning. (The comment for the
> funcgtion needs editing).

Putting LWLockReleaseAll() in AtProcExit_Buffers() is OK, however, it does not work for the cascade exception case if
puttingin shmem_exit().
 
Indeed putting LWLockReleaseAll() in AtProcExit_Buffers() was considered firstly, but as the other part of PG code base
prefersputting in other callbacks (e.g. ShutdownAuxiliaryProcess() callback when UnderPostmaster is true), I just
followedthe same style in patch v2.
 
But after revisited the decision, I think I agree with you, because:
    1. Yes, it looks cleaner in the code.
    2. We can avoid the pain if people forgot or wrongly registered the additional callback.
    3. Calling LWLockReleaseAll() for the second time is quite fast, it will not bring overburden to
AtProcExit_Buffers()

Thus, I have updated the patch v3 according to your suggestions. Could you help to review again?
Please let me know should you have more suggestions or feedbacks.

Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


Вложения

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Amit Kapila
Дата:
On Sat, Jul 27, 2019 at 6:22 AM Chengchao Yu <chengyu@microsoft.com> wrote:
>
> Thus, I have updated the patch v3 according to your suggestions. Could you help to review again?
> Please let me know should you have more suggestions or feedbacks.
>

I have tried to look into this patch and I don't think it fixes the
problem.  Basically, I have tried the commands suggested by you in
single-user mode, create table; insert and then checkpoint.  Now, what
I see is almost the same behavior as explained by you in one of the
above emails with a slight difference which makes me think that the
fix you are proposing is not correct.  Below is what you told:

"The second type is in Step #4. At the time when “checkpoint” SQL
command is being executed, PG has already set up the before_shmem_exit
callbackShutdownPostgres(), which releases all lw-locks given
transaction or sub-transaction is on-going. So after the first IO
error, the buffer page’s lw-lock gets released successfully. However,
later ShutdownXLOG() is invoked, and PG tries to flush buffer pages
again, which results in the second IO error. Different from the first
time, this time, all the previous executed before/on_shmem_exit
callbacks are not invoked again due to the decrease of the indexes. So
lw-locks for buffer pages are not released when PG tries to get the
same buffer lock in AbortBufferIO(), and then PG process gets stuck."

The only difference is in the last line where for me it gives
assertion failure when trying to do ReleaseAuxProcessResources.  Below
is the callstack:

  postgres.exe!ExceptionalCondition(const char *
conditionName=0x00db0c78, const char * errorType=0x00db0c68, const
char * fileName=0x00db0c18, int lineNumber=1722)  Line 55 C
  postgres.exe!UnpinBuffer(BufferDesc * buf=0x052a104c, bool
fixOwner=true)  Line 1722 + 0x2f bytes C
  postgres.exe!ReleaseBuffer(int buffer=96)  Line 3367 + 0x17 bytes C
  postgres.exe!ResourceOwnerReleaseInternal(ResourceOwnerData *
owner=0x0141f6e8, <unnamed-enum-RESOURCE_RELEASE_BEFORE_LOCKS>
phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool
isTopLevel=true)  Line 526 + 0x9 bytes C
  postgres.exe!ResourceOwnerRelease(ResourceOwnerData *
owner=0x0141f6e8, <unnamed-enum-RESOURCE_RELEASE_BEFORE_LOCKS>
phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool
isTopLevel=true)  Line 484 + 0x17 bytes C
  postgres.exe!ReleaseAuxProcessResources(bool isCommit=false)  Line
861 + 0x15 bytes C
> postgres.exe!ReleaseAuxProcessResourcesCallback(int code=1, unsigned int arg=0)  Line 881 + 0xa bytes C
  postgres.exe!shmem_exit(int code=1)  Line 272 + 0x1f bytes C
  postgres.exe!proc_exit_prepare(int code=1)  Line 194 + 0x9 bytes C
  postgres.exe!proc_exit(int code=1)  Line 107 + 0x9 bytes C
  postgres.exe!errfinish(int dummy=0, ...)  Line 538 + 0x7 bytes C
  postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber
forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 713 + 0x4c bytes C
  postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140,
ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 587 + 0x24 bytes C
  postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c,
SMgrRelationData * reln=0x0147e140)  Line 2759 + 0x1d bytes C
  postgres.exe!SyncOneBuffer(int buf_id=95, bool
skip_recently_used=false, WritebackContext * wb_context=0x012ccea0)
Line 2402 + 0xb bytes C
  postgres.exe!BufferSync(int flags=5)  Line 1992 + 0x15 bytes C
  postgres.exe!CheckPointBuffers(int flags=5)  Line 2586 + 0x9 bytes C
  postgres.exe!CheckPointGuts(unsigned __int64
checkPointRedo=22933176, int flags=5)  Line 8991 + 0x9 bytes C
  postgres.exe!CreateCheckPoint(int flags=5)  Line 8780 + 0x11 bytes C
  postgres.exe!ShutdownXLOG(int code=1, unsigned int arg=0)  Line 8333
+ 0x7 bytes C
  postgres.exe!shmem_exit(int code=1)  Line 272 + 0x1f bytes C
  postgres.exe!proc_exit_prepare(int code=1)  Line 194 + 0x9 bytes C
  postgres.exe!proc_exit(int code=1)  Line 107 + 0x9 bytes C
  postgres.exe!errfinish(int dummy=0, ...)  Line 538 + 0x7 bytes C
  postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber
forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 713 + 0x4c bytes C
  postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140,
ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 587 + 0x24 bytes C
  postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c,
SMgrRelationData * reln=0x0147e140)  Line 2759 + 0x1d bytes C
  postgres.exe!SyncOneBuffer(int buf_id=95, bool
skip_recently_used=false, WritebackContext * wb_context=0x012ce580)
Line 2402 + 0xb bytes C
  postgres.exe!BufferSync(int flags=44)  Line 1992 + 0x15 bytes C
  postgres.exe!CheckPointBuffers(int flags=44)  Line 2586 + 0x9 bytes C
  postgres.exe!CheckPointGuts(unsigned __int64
checkPointRedo=22933176, int flags=44)  Line 8991 + 0x9 bytes C
  postgres.exe!CreateCheckPoint(int flags=44)  Line 8780 + 0x11 bytes C
  postgres.exe!RequestCheckpoint(int flags=44)  Line 967 + 0xc bytes C
  postgres.exe!standard_ProcessUtility(PlannedStmt * pstmt=0x0146b738,
const char * queryString=0x0146ad98,
<unnamed-enum-PROCESS_UTILITY_TOPLEVEL>
context=PROCESS_UTILITY_TOPLEVEL, ParamListInfoData *
params=0x00000000, QueryEnvironment * queryEnv=0x00000000,
_DestReceiver * dest=0x00adc1d8, char * completionTag=0x012cfdbc)
Line 769 + 0x28 bytes C

It seems to me there are other things like
ReleaseAuxProcessResources() before AbortBufferIO() which expects
LWLocks to be released.  I didn't get much time to further debug this,
but I think some more analysis is required for this issue.

I guess you didn't encounter this problem because you are not using
Asserts enabled build, but there could be some other reason as well.

I have marked this CF entry as "Waiting on Author".

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode WhenIO Failure Occurs

От
Michael Paquier
Дата:
On Mon, Sep 09, 2019 at 05:34:43PM +0530, Amit Kapila wrote:
> The only difference is in the last line where for me it gives
> assertion failure when trying to do ReleaseAuxProcessResources.  Below
> is the callstack:

No need for Windows on this one and I have reproduced easily the same
trace as Amit.  The patch has been moved to next CF.  Chengchao, could
you provide an update please?
--
Michael

Вложения

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

От
Amit Kapila
Дата:
On Mon, Nov 25, 2019 at 1:17 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 09, 2019 at 05:34:43PM +0530, Amit Kapila wrote:
> > The only difference is in the last line where for me it gives
> > assertion failure when trying to do ReleaseAuxProcessResources.  Below
> > is the callstack:
>
> No need for Windows on this one and I have reproduced easily the same
> trace as Amit.  The patch has been moved to next CF.  Chengchao, could
> you provide an update please?
>

I have marked this patch as "Returned with feedback" as it's been long
since the author has responded.  Feel free to provide a new patch for
the next CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com