On 2013-06-12 08:48:39 -0400, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
> > On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
> > >> The attached patch fixes this problem. It just changes walsender so that it
> > >> waits for all the outstanding WAL records to be replicated to the standby
> > >> before closing the replication connection.
> > >
> > > Imo this is a fix that needs to get backpatched... The code tried to do
> > > this but failed, I don't think it really gives grounds for valid *new*
> > > concerns.
> >
> > +1 (without having looked at the code itself, it's definitely a
> > behaviour that needs to be fixed)
>
> Yea, I was also thinking it would be reasonable to backpatch this; it
> really looks like a bug that we're allowing this to happen today.
>
> So, +1 on a backpatch for me. I've looked at the patch (it's a
> one-liner, plus some additional comments) but havn't looked through the
> overall code surrounding it.
I've read most of the surrounding code and I think the patch is as
sensible as it can be without reworking the whole walsender main loop
which seems like a job for another day.
I'd personally write if (caughtup && !pq_is_send_pending() && sentPtr == MyWalSnd->flush)
as if (caughtup && sentPtr == MyWalSnd->flush && !pq_is_send_pending())
Since pq_is_send_pending() basically can only be false if the flush
comparison is true. There's the tiny chance that we were sending a
message out just before which is why we should include the
!pq_is_send_pending() condition at all in that if().
But that's fairly, fairly minor.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services