Обсуждение: [GENERAL] Confused about max_standby_streaming_delay

Поиск
Список
Период
Сортировка

[GENERAL] Confused about max_standby_streaming_delay

От
Robert Inder
Дата:
A server running PostgreSQL 9.4 on Centos 6.8, with a live server and a hot standby, is supporting about 20 customer organisations, each with their own linux user and its own installation/copy of the system, talking to its own database.

The system has a backup script which is a wrapper round pg_dump. 
For most installations, it takes "a couple of minutes" to run, though others take longer.

On the standby server, we have a directory full of cron jobs -- one for each installation -- to invoke this backup script
Each script is run as soon as the previous one is finished. So when all goes to plan, we get a dump of
     database_a at 5 minutes past the hour,
     database_b at 7 minutes past,
    database_c at 9 minutes past
and so on.

H
​owever, we o​
ccasionally
​ have​
a dump die with a message
​ like​

       Error message from server: ERROR:  canceling statement due to conflict with recovery

I believe this means that an update from the live server wants to change a table that is being dumped, and so the dump is aborted.

And I've read that the answer to this is to set max_standby_streaming_delay in postgresql94.conf.
So I've set it to "600s" -- ten minutes. 

I thought this would mean that when there was a conflict with an update from the live server, Postgres would give the dump 10 minutes "grace" in which to finish before killing it.

Ten minutes may or may not be enough.  But in a case where it isn't enough, and the dump is abandonned, I would expect to see something like

    the dump of database_a finishing at 5 minutes past the hour,
    the dump of database_b
          starting after the dump of database_a,
          having a conflict,
          being given 10 minutes to complete, and then
          being abandonned
    the dump of database_c starting after the dump of database_b and finishing (say) 3 minutes later

So the dump of database_c should finish at around 18 minutes past the hour.

BUT that is not what I am seeing.
On occasions where the dump of database_b is being abandonned, the successful dump of
database_c is timestamped less than 10 minutes after the dump of database_a

Which does not fit with the dump of database_b being given 10 minutes in which to finish

Have I misunderstood something?  Or is Postgres not actually configured the way I think it is?

Robert.

--
Robert Inder,                                    0131 229 1052 / 07808 492 213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
                                           Interactions speak louder than words

Re: [GENERAL] Confused about max_standby_streaming_delay

От
Jeff Janes
Дата:
On Wed, Sep 6, 2017 at 9:43 AM, Robert Inder <robert@interactive.co.uk> wrote:

... 
And I've read that the answer to this is to set max_standby_streaming_delay in postgresql94.conf.
So I've set it to "600s" -- ten minutes. 

I thought this would mean that when there was a conflict with an update from the live server, Postgres would give the dump 10 minutes "grace" in which to finish before killing it.

Ten minutes may or may not be enough.  But in a case where it isn't enough, and the dump is abandonned, I would expect to see something like

    the dump of database_a finishing at 5 minutes past the hour,
    the dump of database_b
          starting after the dump of database_a,
          having a conflict,
          being given 10 minutes to complete, and then
          being abandonned
    the dump of database_c starting after the dump of database_b and finishing (say) 3 minutes later

So the dump of database_c should finish at around 18 minutes past the hour.

BUT that is not what I am seeing.
On occasions where the dump of database_b is being abandonned, the successful dump of
database_c is timestamped less than 10 minutes after the dump of database_a

Which does not fit with the dump of database_b being given 10 minutes in which to finish

Have I misunderstood something?  Or is Postgres not actually configured the way I think it is?

The standby will wait for ten minutes to obtain the lock it wishes to obtain.  In 9.4, if something other than dump of database b was already blocking it for 8 minutes before the dump starts, then the dump of database b will only have 2 minutes, not 10, before it gets cancelled.  So, are there any long running jobs in database b other than the pg_dump?
 
Cheers,

Jeff

Re: [GENERAL] Confused about max_standby_streaming_delay

От
Robert Inder
Дата:


On 6 September 2017 at 20:47, Jeff Janes <jeff.janes@gmail.com> wrote:

Have I misunderstood something?  Or is Postgres not actually configured the way I think it is?

The standby will wait for ten minutes to obtain the lock it wishes to obtain.  In 9.4, if something other than dump of database b was already blocking it for 8 minutes before the dump starts, then the dump of database b will only have 2 minutes, not 10, before it gets cancelled. 

Hmmmm...
You're saying that the time for dumping database b may be spent 8 minutes waiting on a lock then 2 minutes actually dumping.

But would it not still be at least 10 minutes elapsed between the finish time of the previous dump and the following dump starting (let alone finishing)?  And that's what I'm not seeing... 

When one dump fails, there is no 10-minute gap in the mod. times of the other successful dump files....

So, are there any long running jobs in database b other than the pg_dump?
​There shouldn't be.


The standby server is (well, should be!) essentially idle:
 we're using it to do the dumps, so that they don't
 get under the feet of live queries, but ​(I think -- I'll have to check!) that is all
 
 
Cheers,

Jeff

​Robert.​

--
Robert Inder,                                    0131 229 1052 / 07808 492 213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
                                           Interactions speak louder than words

Re: [GENERAL] Confused about max_standby_streaming_delay

От
Jeff Janes
Дата:
On Thu, Sep 7, 2017 at 1:16 AM, Robert Inder <robert@interactive.co.uk> wrote:


On 6 September 2017 at 20:47, Jeff Janes <jeff.janes@gmail.com> wrote:

Have I misunderstood something?  Or is Postgres not actually configured the way I think it is?

The standby will wait for ten minutes to obtain the lock it wishes to obtain.  In 9.4, if something other than dump of database b was already blocking it for 8 minutes before the dump starts, then the dump of database b will only have 2 minutes, not 10, before it gets cancelled. 

Hmmmm...
You're saying that the time for dumping database b may be spent 8 minutes waiting on a lock then 2 minutes actually dumping.

No, I'm saying that maybe the replay process was already waiting for something else for 8 minutes before the pg_dump of database b even attempted to start.  So it would be cancelled after 2 minutes.

Are these database a, database b, etc. different databases in the same postgres instance (CREATE DATABASE A) or are they entirely different postgres instances (initdb -D /opt/database_a)?  Your original description of different unix users with different installations made me think the 2nd case is the one, but just want to make sure.

Cheers,

Jeff