On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote:
> On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Jeff Janes <jeff.janes@gmail.com> writes:
> >> To summarize the behavior change:
> >
> >> In the released code, an inserting backend that violates the pending
> >> list limit will try to clean the list, even if it is already being
> >> cleaned. It won't accomplish anything useful, but will go through the
> >> motions until eventually it runs into a page the lead cleaner has
> >> deleted, at which point it realizes there is another cleaner and it
> >> stops. This acts as a natural throttle to how fast insertions can
> >> take place into an over-sized pending list.
> >
> > Right.
> >
> >> The proposed change removes that throttle, so that inserters will
> >> immediately see there is already a cleaner and just go back about
> >> their business. Due to that, unthrottled backends could add to the
> >> pending list faster than the cleaner can clean it, leading to
> >> unbounded growth in the pending list and could cause a user backend to
> >> becoming apparently unresponsive to the user, indefinitely. That is
> >> scary to backpatch.
> >
> > It's scary to put into HEAD, either. What if we simply don't take
> > that specific behavioral change? It doesn't seem like this is an
> > essential part of fixing the bug as you described it. (Though I've
> > not read the patch, so maybe I'm just missing the connection.)
>
> There are only 3 fundamental options I see, the cleaner can wait,
> "help", or move on.
>
> "Helping" is what it does now and is dangerous.
>
> Moving on gives the above-discussed unthrottling problem.
>
> Waiting has two problems. The act of waiting will cause autovacuums
> to be canceled, unless ugly hacks are deployed to prevent that. If
> we deploy those ugly hacks, then we have the problem that a user
> backend will end up waiting on an autovacuum to finish the cleaning,
> and the autovacuum is taking its sweet time due to
> autovacuum_vacuum_cost_delay.
Teodor, this thread has been quiet for four days, and the deadline to fix this
open item expired 23 hours ago. Do you have a new plan for fixing it?