RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs

Поиск
Список
Период
Сортировка
От Chengchao Yu
Тема RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs
Дата
Msg-id CY4PR2101MB0804A6A3AFB928423D04FE1CAA920@CY4PR2101MB0804.namprd21.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi Amit, Thomas,

Thank you very much for your feedbacks! Apologizes but I just saw both messages.

> We generally reserve the space in a relation before attempting to write, so not sure how you are able to hit the disk
fullsituation via mdwrite.  If you see the description of the function, that also indicates same.
 

Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due to our own storage layer which treat
mdextend()actions as setting length of the file only. We have a workaround, and any change isn’t needed for Postgres.
 

> I am not telling that mdwrite can never lead to error, but just trying to understand the issue you actually faced.  I
haven'tread your proposed solution yet, let's first try to establish the problem you are facing.
 

We see transient IO errors reading a block where PG instance gets dead-lock in single user mode until we kill the
instance.The stack trace below shows the behavior which is when mdread() failed with buffer holding its lw-lock. This
happensbecause in single user mode there is no call back to release the lock and when AbortBufferIO() tries to acquire
thesame lock again, it will wait for the lock indefinitely.
 

Here is the stack trace:

0a 00000004`8080cc30 00000004`80dcf917 postgres!PGSemaphoreLock+0x65
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c@ 158] 
 
0b 00000004`8080cc90 00000004`80db025c postgres!LWLockAcquire+0x137
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c@ 1234] 
 
0c 00000004`8080ccd0 00000004`80db25db postgres!AbortBufferIO+0x2c
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 3995] 
 
0d 00000004`8080cd20 00000004`80dbce36 postgres!AtProcExit_Buffers+0xb
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 2479] 
 
0e 00000004`8080cd50 00000004`80dbd1bd postgres!shmem_exit+0xf6 [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @
262]
 
0f 00000004`8080cd80 00000004`80dbccfd postgres!proc_exit_prepare+0x4d
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c@ 188] 
 
10 00000004`8080cdb0 00000004`80ef9e74 postgres!proc_exit+0xd [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @
141]
 
11 00000004`8080cde0 00000004`80ddb6ef postgres!errfinish+0x204 [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c
@624] 
 
12 00000004`8080ce50 00000004`80db0f59 postgres!mdread+0x12f [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @
806]
 
13 00000004`8080cea0 00000004`80daeb70 postgres!ReadBuffer_common+0x2c9
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 897] 
 
14 00000004`8080cf30 00000004`80b81322 postgres!ReadBufferWithoutRelcache+0x60
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c@ 694] 
 
15 00000004`8080cf90 00000004`80db9cbb postgres!XLogReadBufferExtended+0x142
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlogutils.c@ 513] 
 
16 00000004`8080cff0 00000004`80b2f53a postgres!XLogRecordPageWithFreeSpace+0xbb
[d:\orcasqlagsea10\14\s\src\backend\storage\freespace\freespace.c@ 254] 
 
17 00000004`8080d030 00000004`80b6eb94 postgres!heap_xlog_insert+0x36a
[d:\orcasqlagsea10\14\s\src\backend\access\heap\heapam.c@ 8491] 
 
18 00000004`8080f0d0 00000004`80f0a13f postgres!StartupXLOG+0x1f84
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlog.c@ 7480] 
 
19 00000004`8080fbf0 00000004`80de121e postgres!InitPostgres+0x12f
[d:\orcasqlagsea10\14\s\src\backend\utils\init\postinit.c@ 656] 
 
1a 00000004`8080fcd0 00000004`80c92c31 postgres!PostgresMain+0x25e [d:\orcasqlagsea10\14\s\src\backend\tcop\postgres.c
@3881] 
 
1b 00000004`8080fed0 00000004`80f51df3 postgres!main+0x491 [d:\orcasqlagsea10\14\s\src\backend\main\main.c @ 235] 

Please let us know should you have more feedbacks. Thank you!
 
Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-----Original Message-----
From: Thomas Munro <thomas.munro@enterprisedb.com> 
Sent: Thursday, January 24, 2019 2:32 PM
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: Chengchao Yu <chengyu@microsoft.com>; Pg Hackers <pgsql-hackers@postgresql.org>; Prabhat Tripathi
<ptrip@microsoft.com>;Sunil Kamath <Sunil.Kamath@microsoft.com>; Michal Primke <mprimke@microsoft.com>
 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu <chengyu@microsoft.com> wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO
failures)happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres
versions.
> >
> > 3.       Because the unable to write relation data scenario is difficult to hit naturally even reserved space is
turnedoff, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to
writedata to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any
more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, 
> BlockNumber blocknum, nbytes, BLCKSZ);
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("Emulate exception in mdwrite() when writing to disk")));
> +
>
> We generally reserve the space in a relation before attempting to 
> write, so not sure how you are able to hit the disk full situation via 
> mdwrite.  If you see the description of the function, that also 
> indicates same.

Presumably ZFS or BTRFS or something more exotic could still get ENOSPC here, and of course any filesystem could give
usEIO here (because the disk is on fire or the remote NFS server is rebooting due to an automatic Windows update).
 

--
Thomas Munro

https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&data=02%7C01%7Cchengyu%40microsoft.com%7C58d8ab2c88044c1a78ef08d6824bcfdd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636839659503024691&sdata=tZCFSt7zMWO%2BbIITkggTuTsxu7JpKTYG1UYIwZE8XEc%3D&reserved=0

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: fast defaults in heap_getattr vs heap_deform_tuple
Следующее
От: Andres Freund
Дата:
Сообщение: Don't deform column-by-column in composite_to_json