Re: Re: Hot Standby query cancellation and Streaming Replication integration

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Re: Hot Standby query cancellation and Streaming Replication integration
Дата
Msg-id 201003022136.o22LaqT29573@momjian.us
обсуждение исходный текст
Ответ на Re: Hot Standby query cancellation and Streaming Replication integration  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.
>
> In short in the current setup I think there is no safe value of
> max_standby_age which will prevent query cancellations short of -1. If
> the slave has a constant stream of queries and always has at least one
> concurrent query running then it's possible that the slave will run
> continuously max_standby_age-epsilon behind the master and cancel
> queries left and right, regardless of how large max_standby_age is.
>
> To resolve this I think you would have to introduce some chance for
> the slave to catch up. Something like refusing to use a snapshot older
> than max_standby_age/2  and instead wait until the existing queries
> finish and the slave gets a chance to catch up and see a more recent
> snapshot. The problem is that this would result in very unpredictable
> and variable response times from the slave. A single long-lived query
> could cause replay to pause for a big chunk of max_standby_age and
> prevent any new query from starting.

That is a good point. I have added the attached documentation patch to
mention that max_standby_delay increases the master/slave inconsistency,
and not to use it for xid-keepalive connections.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.256
diff -c -c -r1.256 config.sgml
*** doc/src/sgml/config.sgml    27 Feb 2010 14:46:05 -0000    1.256
--- doc/src/sgml/config.sgml    2 Mar 2010 21:03:14 -0000
***************
*** 1869,1875 ****
          this parameter makes sense only during replication, so when
          performing an archive recovery to recover from data loss a very high
          parameter setting or -1 which means wait forever is recommended.
!         The default is 30 seconds.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
--- 1869,1876 ----
          this parameter makes sense only during replication, so when
          performing an archive recovery to recover from data loss a very high
          parameter setting or -1 which means wait forever is recommended.
!         The default is 30 seconds.  Increasing this parameter can delay
!         master server changes from appearing on the standby.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.52
diff -c -c -r1.52 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    27 Feb 2010 09:29:20 -0000    1.52
--- doc/src/sgml/high-availability.sgml    2 Mar 2010 21:03:14 -0000
***************
*** 1410,1416 ****
      that the primary and standby nodes are linked via the WAL, so the cleanup
      situation is no different from the case where the query ran on the primary
      node itself.  And you are still getting the benefit of off-loading the
!     execution onto the standby.
     </para>

     <para>
--- 1410,1418 ----
      that the primary and standby nodes are linked via the WAL, so the cleanup
      situation is no different from the case where the query ran on the primary
      node itself.  And you are still getting the benefit of off-loading the
!     execution onto the standby. <varname>max_standby_delay</> should
!     not be used in this case because delayed WAL files might already
!     contain entries that invalidate the current shapshot.
     </para>

     <para>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marc Munro
Дата:
Сообщение: Hot Standby query cancellation and Streaming Replication integration
Следующее
От: Christopher Browne
Дата:
Сообщение: Re: caracara failing to bind to localhost?