Обсуждение: hot_standby_feedback parameter doesn't work

Поиск
Список
Период
Сортировка

hot_standby_feedback parameter doesn't work

От
Andrey Zhidenkov
Дата:
We have a few database clusters (1 master, 2 hot standbys) with
hot_standby_feedback setting on. But I am constantly seeing query
conflicts on standbys because dead rows on the masters are deleted by
VACUUM. Usually I notice it shortly after VACUUM ends, because the
replication lag on replicas is starting to increase.

vacuum_defer_cleanup_age = 0. PostgreSQL version is 9.5.13. Cluster
load is about 1500 transactions per second.

Any ideas?

-
With the best regards, Andrey Zhidenkov


Re: hot_standby_feedback parameter doesn't work

От
Rui DeSousa
Дата:

> On Nov 12, 2018, at 10:28 PM, Andrey Zhidenkov <andrey.zhidenkov@gmail.com> wrote:
>
> We have a few database clusters (1 master, 2 hot standbys) with
> hot_standby_feedback setting on. But I am constantly seeing query
> conflicts on standbys because dead rows on the masters are deleted by
> VACUUM. Usually I notice it shortly after VACUUM ends, because the
> replication lag on replicas is starting to increase.
>
> vacuum_defer_cleanup_age = 0. PostgreSQL version is 9.5.13. Cluster
> load is about 1500 transactions per second.
>
> Any ideas?
>
> -
> With the best regards, Andrey Zhidenkov
>

Are these streaming replicas?  The wal_streaming process on the master should show the xmin of the replicas; you can
checkthe value from pg_stat_activity to make sure the feedback is occurring. 

Are the replicas disconnecting from the replication stream? As the would mean the xmin from the replica would no longer
bein play. 

Re: hot_standby_feedback parameter doesn't work

От
Andres Freund
Дата:
Hi,

On 2018-11-13 10:28:20 +0700, Andrey Zhidenkov wrote:
> We have a few database clusters (1 master, 2 hot standbys) with
> hot_standby_feedback setting on. But I am constantly seeing query
> conflicts on standbys because dead rows on the masters are deleted by
> VACUUM. Usually I notice it shortly after VACUUM ends, because the
> replication lag on replicas is starting to increase.
> 
> vacuum_defer_cleanup_age = 0. PostgreSQL version is 9.5.13. Cluster
> load is about 1500 transactions per second.

What's the precise conflict you get? I'd assume it's because of
exclusive locks, not row version conflicts. Is that right?

If so, that'd likely be because vacuum gets a chance to truncate the
relations at the end of vacuum.

Before theorizing further, it'd be good to get confirmation that that's
the right theory.

Greetings,

Andres Freund


Re: hot_standby_feedback parameter doesn't work

От
Andrey Zhidenkov
Дата:
I have increasing replication lag on the hot standby. I grep processes
on the stanby and see that WAL writer is in status "waiting". This
always happens after (auto)vacuuming of table used by problem query.
Once I kill problem query on the standby WAL segments are applied and
the replication lag is no longer exists.

Forgot to mention that max_standby_streaming_delay = 0 and
max_standby_archive_delay = 0.


Re: hot_standby_feedback parameter doesn't work

От
Andrey Zhidenkov
Дата:
On Tue, Nov 13, 2018 at 2:34 PM Rui DeSousa <rui@crazybean.net> wrote:
>
>
>
> > On Nov 12, 2018, at 10:28 PM, Andrey Zhidenkov <andrey.zhidenkov@gmail.com> wrote:
> >
> > We have a few database clusters (1 master, 2 hot standbys) with
> > hot_standby_feedback setting on. But I am constantly seeing query
> > conflicts on standbys because dead rows on the masters are deleted by
> > VACUUM. Usually I notice it shortly after VACUUM ends, because the
> > replication lag on replicas is starting to increase.
> >
> > vacuum_defer_cleanup_age = 0. PostgreSQL version is 9.5.13. Cluster
> > load is about 1500 transactions per second.
> >
> > Any ideas?
> >
> > -
> > With the best regards, Andrey Zhidenkov
> >
>
> Are these streaming replicas?  The wal_streaming process on the master should show the xmin of the replicas; you can
checkthe value from pg_stat_activity to make sure the feedback is occurring.
 
>
Yes, they are streaming replicas. Could you please tell me how to
check xmin of the replica in pg_stat_activity? I didn't get the point.

> Are the replicas disconnecting from the replication stream? As the would mean the xmin from the replica would no
longerbe in play.
 

No, the replicas are not disconnecting.

-- 
-
With best regards, Andrey Zhidenkov


Re: hot_standby_feedback parameter doesn't work

От
Andrew Gierth
Дата:
>>>>> "Andrey" == Andrey Zhidenkov <andrey.zhidenkov@gmail.com> writes:

 Andrey> We have a few database clusters (1 master, 2 hot standbys) with
 Andrey> hot_standby_feedback setting on. But I am constantly seeing
 Andrey> query conflicts on standbys because dead rows on the masters
 Andrey> are deleted by VACUUM. Usually I notice it shortly after VACUUM
 Andrey> ends, because the replication lag on replicas is starting to
 Andrey> increase.

Feedback can only try and avoid one of the approximately five possible
causes of conflicts (albeit the most common one).

Next most likely is a pin conflict, especially if you have any tables
involved which are both small and with a relatively high update
frequency.

(Unfortunately with a max delay of 0 the standby queries won't be getting
cancelled which in turn means that there are no statistics about the
causes of conflicts.)

The most likely scenario for a pin conflict is if you have queries which
are (a) long-running, and (b) contain a sequential scan of a small (one
or a few blocks) table that is _not_ underneath a Hash node or similar;
for example, if the Seq Scan appears as the outer path of a Nestloop
join at or near the top of the plan. In this case, vacuum of the small
table may block because the query is holding pin for an extended period
of time on the block that vacuum wants to clean up. (It can happen with
index scans too, but is less likely.)

-- 
Andrew (irc:RhodiumToad)


Re: hot_standby_feedback parameter doesn't work

От
Andrey Zhidenkov
Дата:
The problem is related to large tables (billions of rows) on the
database with tps about 1500 transactions per second. I will continue
investigate. Thank you for your tips.
On Tue, Nov 13, 2018 at 5:04 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>
> >>>>> "Andrey" == Andrey Zhidenkov <andrey.zhidenkov@gmail.com> writes:
>
>  Andrey> We have a few database clusters (1 master, 2 hot standbys) with
>  Andrey> hot_standby_feedback setting on. But I am constantly seeing
>  Andrey> query conflicts on standbys because dead rows on the masters
>  Andrey> are deleted by VACUUM. Usually I notice it shortly after VACUUM
>  Andrey> ends, because the replication lag on replicas is starting to
>  Andrey> increase.
>
> Feedback can only try and avoid one of the approximately five possible
> causes of conflicts (albeit the most common one).
>
> Next most likely is a pin conflict, especially if you have any tables
> involved which are both small and with a relatively high update
> frequency.
>
> (Unfortunately with a max delay of 0 the standby queries won't be getting
> cancelled which in turn means that there are no statistics about the
> causes of conflicts.)
>
> The most likely scenario for a pin conflict is if you have queries which
> are (a) long-running, and (b) contain a sequential scan of a small (one
> or a few blocks) table that is _not_ underneath a Hash node or similar;
> for example, if the Seq Scan appears as the outer path of a Nestloop
> join at or near the top of the plan. In this case, vacuum of the small
> table may block because the query is holding pin for an extended period
> of time on the block that vacuum wants to clean up. (It can happen with
> index scans too, but is less likely.)
>
> --
> Andrew (irc:RhodiumToad)



-- 
-
With best regards, Andrey Zhidenkov


Re: hot_standby_feedback parameter doesn't work

От
Rui DeSousa
Дата:

> On Nov 13, 2018, at 4:18 AM, Andrey Zhidenkov <andrey.zhidenkov@gmail.com> wrote:
>
> Yes, they are streaming replicas. Could you please tell me how to
> check xmin of the replica in pg_stat_activity? I didn't get the point.
>

select pid
 , usename
 , application_name
 , backend_start
 , backend_xmin
 , state
from pg_stat_replication
;

select pid
 , state
 , backend_xid
 , backend_xmin
 , backend_type
 , backend_start
from pg_stat_activity
where backend_xmin is not null
;

Vacuum will use the xmin to determine if a record is still needed or not thus I just wanted to make sure the replica
aresending that information to the master.   

What is the exact error message?



Re: hot_standby_feedback parameter doesn't work

От
Andrey Zhidenkov
Дата:
It turned out that the problem is not caused by dead rows removing.
The problem is that autovacuum process truncates empty pages at the
end of the relation and takes AccessExclusiveLock on the relation. WAL
receiver process, in turn, tries to take this log while replaying WAL
segment with corresponding standby_redo command and fails because of
the long-running query holding AccessShareLock on standby. Since
max_standby_streaming_delay setting is set to -1 WAL receiver waits
until query is finished and as a result the replication lag is
increasing.

There is a patch proposed in 2018/11 commit fest but it is not even
reviewed yet: https://commitfest.postgresql.org/20/1683/
On Tue, Nov 13, 2018 at 10:53 PM Rui DeSousa <rui@crazybean.net> wrote:
>
>
>
> > On Nov 13, 2018, at 4:18 AM, Andrey Zhidenkov <andrey.zhidenkov@gmail.com> wrote:
> >
> > Yes, they are streaming replicas. Could you please tell me how to
> > check xmin of the replica in pg_stat_activity? I didn't get the point.
> >
>
> select pid
>  , usename
>  , application_name
>  , backend_start
>  , backend_xmin
>  , state
> from pg_stat_replication
> ;
>
> select pid
>  , state
>  , backend_xid
>  , backend_xmin
>  , backend_type
>  , backend_start
> from pg_stat_activity
> where backend_xmin is not null
> ;
>
> Vacuum will use the xmin to determine if a record is still needed or not thus I just wanted to make sure the replica
aresending that information to the master.
 
>
> What is the exact error message?
>


-- 
-
With best regards, Andrey Zhidenkov