Re: strange parallel query behavior after OOM crashes
От | Tomas Vondra |
---|---|
Тема | Re: strange parallel query behavior after OOM crashes |
Дата | |
Msg-id | 73b84052-5dfe-0c40-4212-cfa96a1c6027@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: strange parallel query behavior after OOM crashes (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Ответы |
Re: strange parallel query behavior after OOM crashes
Re: strange parallel query behavior after OOM crashes |
Список | pgsql-hackers |
On 04/04/2017 06:52 PM, Robert Haas wrote: > On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: >> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh >>> <kuntalghosh.2007@gmail.com> wrote: >>>> 2. the server restarts automatically, initialize >>>> BackgroundWorkerData->parallel_register_count and >>>> BackgroundWorkerData->parallel_terminate_count in the shared memory. >>>> After that, it calls ForgetBackgroundWorker and it increments >>>> parallel_terminate_count. >>> >>> Hmm. So this seems like the root of the problem. Presumably those >>> things need to be reset AFTER forgetting any background workers from >>> before the crash. >>> >> IMHO, the fix would be not to increase the terminated parallel worker >> count whenever ForgetBackgroundWorker is called due to a bgworker >> crash. I've attached a patch for the same. PFA. > > While I'm not opposed to that approach, I don't think this is a good > way to implement it. If you want to pass an explicit flag to > ForgetBackgroundWorker telling it whether or not it should performing > the increment, fine. But with what you've got here, you're > essentially relying on "spooky action at a distance". It would be > easy for future code changes to break this, not realizing that > somebody's got a hard dependency on 0 having a specific meaning. > I'm probably missing something, but I don't quite understand how these values actually survive the crash. I mean, what I observed is OOM followed by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the values back to 0? Or do we call ForgetBackgroundWorker() after the crash for some reason? In any case, the comment right before BackgroundWorkerArray says this: * These counters can of course overflow, but it's not important here * since the subtraction will still give the right number. which means that this assert + Assert(BackgroundWorkerData->parallel_register_count >= + BackgroundWorkerData->parallel_terminate_count); is outright broken, just like any other attempts to rely on simple comparisons of these two counters, no? > BTW, if this isn't on the open items list, it should be. It's > presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9. > Unrelated, but perhaps we should also add this to open items: https://www.postgresql.org/message-id/flat/57bbc52c-5d40-bb80-2992-7e1027d0b67c%402ndquadrant.com (although that's more a 9.6 material). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Pavan DeolaseeДата:
Сообщение: Re: Patch: Write Amplification Reduction Method (WARM)