Обсуждение: BUG #14973: hung queries
The following bug has been logged on the website: Bug reference: 14973 Logged by: Dmitry Shalashov Email address: skaurus@gmail.com PostgreSQL version: 10.1 Operating system: Debian 9 Description: We stumbled upon queries running for a day or more. They are simple ones, so that should not be happening. And most of the time it don't - very small share of these queries ends up like this. Moreover, these queries couldn't be stopped. pg_stat_activity says that they all have wait_event_type = IPC, wait_event = BtreePage, state = active strace tells that they all inside epoll_wait syscall grep over ps says that they all are "postgres: bgworker: parallel worker for PID ..." Looks like some bug in parallel seq scan maybe? We are going to disable parallel seq scan and restart our server in like 4 hours from now. I can get more debug if asked before that.
On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 14973 > Logged by: Dmitry Shalashov > Email address: skaurus@gmail.com > PostgreSQL version: 10.1 > Operating system: Debian 9 > Description: > > We stumbled upon queries running for a day or more. They are simple ones, so > that should not be happening. And most of the time it don't - very small > share of these queries ends up like this. > > Moreover, these queries couldn't be stopped. > > pg_stat_activity says that they all have wait_event_type = IPC, wait_event = > BtreePage, state = active > > strace tells that they all inside epoll_wait syscall > > grep over ps says that they all are "postgres: bgworker: parallel worker for > PID ..." > > Looks like some bug in parallel seq scan maybe? > > We are going to disable parallel seq scan and restart our server in like 4 > hours from now. I can get more debug if asked before that. Hello Dmitry, Thank you for the report. It sounds like a known bug in 10.0 and 10.1 that was recently fixed: https://www.postgresql.org/message-id/E1ePESn-0005PV-S9%40gemulon.postgresql.org The problem is in Parallel Index Scan for btree. The fix will be in 10.2. One workaround in the meantime would be to disable parallelism for that query (SET max_parallel_workers_per_gather = 0). -- Thomas Munro http://www.enterprisedb.com
On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote: >> pg_stat_activity says that they all have wait_event_type = IPC, wait_event = >> BtreePage, state = active > > https://www.postgresql.org/message-id/E1ePESn-0005PV-S9%40gemulon.postgresql.org > > The problem is in Parallel Index Scan for btree. The fix will be in > 10.2. One workaround in the meantime would be to disable parallelism > for that query (SET max_parallel_workers_per_gather = 0). On second thoughts, a more targeted workaround to avoid just these buggy parallel index scans without disabling parallelism in general might be: SET min_parallel_index_scan_size = '5TB'; (Assuming you don't have any indexes that large.) -- Thomas Munro http://www.enterprisedb.com
Hi Thomas,
I'm glad to help. Thanks for the advice!
By the way, there was a mistake in my bug report - wait_event actually was BgWorkerShutdown.
2017-12-18 22:55 GMT+03:00 Thomas Munro <thomas.munro@enterprisedb.com>:
On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:
>> pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
>> BtreePage, state = active
>
> https://www.postgresql.org/message-id/E1ePESn-0005PV-S9% 40gemulon.postgresql.org
>
> The problem is in Parallel Index Scan for btree. The fix will be in
> 10.2. One workaround in the meantime would be to disable parallelism
> for that query (SET max_parallel_workers_per_gather = 0).
On second thoughts, a more targeted workaround to avoid just these
buggy parallel index scans without disabling parallelism in general
might be:
SET min_parallel_index_scan_size = '5TB';
(Assuming you don't have any indexes that large.)
On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote: > Hi Thomas, > > I'm glad to help. Thanks for the advice! > > By the way, there was a mistake in my bug report - wait_event actually was > BgWorkerShutdown. > I think BgWorkerShutdown type of wait event can be only for the master backend not for all the workers. Are there any other wait events? Can we get a stack trace of one or more workers? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 19, 2017 at 4:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote: >> Hi Thomas, >> >> I'm glad to help. Thanks for the advice! >> >> By the way, there was a mistake in my bug report - wait_event actually was >> BgWorkerShutdown. > > I think BgWorkerShutdown type of wait event can be only for the master > backend not for all the workers. Yeah, that's what happens when calling WaitForBackgroundWorkerShutdown() as the primary backend waits for all the workers to stop. You can see it as well this wait event in a logirep launcher by the way. -- Michael