Обсуждение: Strange query stalls on replica in 9.3.9

Поиск
Список
Период
Сортировка

Strange query stalls on replica in 9.3.9

От
Josh Berkus
Дата:
Setup:

* PostgreSQL 9.3.9
* 1 master, 1 replica
* Tiny database, under 0.5GB, completely cached in shared_buffers
* 90% read query traffic, which is handled by replica
* Traffic in the 1000's QPS.

The wierdness:

Periodically the master runs an "update all rows" query on the main
table in the database.  When this update hits the replica via
replication stream, *some* (about 5%) of the queries which do seq scans
will stall for 22 to 32 seconds (these queries normally take about
75ms).  Queries which do index scans seem not to be affected.

Thing is, the update all rows only takes 2.5 seconds to execute on the
master. So even if the update is blocking the seq scans on the replica
(and I can't see why it would), it should only block them for < 3 seconds.

Anyone seen anything like this?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Strange query stalls on replica in 9.3.9

От
Kevin Grittner
Дата:
Josh Berkus <josh@agliodbs.com> wrote:

> Periodically the master runs an "update all rows" query on the main
> table in the database.  When this update hits the replica via
> replication stream, *some* (about 5%) of the queries which do seq scans
> will stall for 22 to 32 seconds (these queries normally take about
> 75ms).  Queries which do index scans seem not to be affected.
>
> Thing is, the update all rows only takes 2.5 seconds to execute on the
> master. So even if the update is blocking the seq scans on the replica
> (and I can't see why it would), it should only block them for < 3 seconds.

Visibility hinting and/or hot pruning?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Strange query stalls on replica in 9.3.9

От
Jeff Janes
Дата:
On Thu, Aug 13, 2015 at 10:09 AM, Josh Berkus <josh@agliodbs.com> wrote:
Setup:

* PostgreSQL 9.3.9
* 1 master, 1 replica
* Tiny database, under 0.5GB, completely cached in shared_buffers
* 90% read query traffic, which is handled by replica
* Traffic in the 1000's QPS.

The wierdness:

Periodically the master runs an "update all rows" query on the main
table in the database.  When this update hits the replica via
replication stream, *some* (about 5%) of the queries which do seq scans
will stall for 22 to 32 seconds (these queries normally take about
75ms).  Queries which do index scans seem not to be affected.

Thing is, the update all rows only takes 2.5 seconds to execute on the
master. So even if the update is blocking the seq scans on the replica
(and I can't see why it would), it should only block them for < 3 seconds.

Anyone seen anything like this?

Sounds like another manifestation of this:  "[PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?"



Each backend that does a seqscan, for each tuple it scans which is not yet resolved (which near the end of the bulk update is going to be nearly equal to 2*reltuples, as every tuple has both an old and a new version so one xmax from one and one xmin from the other must be checked), it has to lock and scan the proc array lock to see if the tuple-inserting transaction has committed yet. This creates profound contention on the lock.  Every scanning backend is looping over every other backend for every tuple

Once the commit of the whole-table update has replayed, the problem should go way instantly because at that point each backend doing the seqscan will find the the transaction has committed and so will set the hint bit that means all of the other seqscan backends that come after it can skip the proc array scan for that tuple.

So perhaps the commit of the whole-table update is delayed because the startup process as also getting bogged down on the same contended lock?  I don't know how hard WAL replay hits the proc array lock.

Cheers,

Jeff

Re: Strange query stalls on replica in 9.3.9

От
Josh Berkus
Дата:
On 08/13/2015 01:24 PM, Kevin Grittner wrote:
>> Thing is, the update all rows only takes 2.5 seconds to execute on the
>> master. So even if the update is blocking the seq scans on the replica
>> (and I can't see why it would), it should only block them for < 3 seconds.
>
> Visibility hinting and/or hot pruning?

Unlikely; I can VACUUM FULL the entire database in 30 seconds.  This
database is small.  Jeff's answer seems more likely ...

On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the

> Once the commit of the whole-table update has replayed, the problem
> should go way instantly because at that point each backend doing the
> seqscan will find the the transaction has committed and so will set the
> hint bit that means all of the other seqscan backends that come after it
> can skip the proc array scan for that tuple.

Yes ... and given that the commit on the master took < 3 seconds, it's
not likely to take 30 seconds on the replica.  That aside, the pattern
of behavior does look similar to the planner issue.

> So perhaps the commit of the whole-table update is delayed because the
> startup process as also getting bogged down on the same contended lock?
> I don't know how hard WAL replay hits the proc array lock.

I don't know; we don't have any visibility into the replay process, and
no way to tell if replay is waiting on some kind of lock.  A regular
UPDATE should not block against any select activity on the replay, though.

Also, why would this affect *only* the query which does seq scans?  Is
there some difference between seqscan and index scan here, or is it
simply because they take longer, and since this issue is timing-based,
they're more likely to be hit?  Significantly, the seqscan query is also
the most complex query run against the replica, so maybe the seqscan is
irrelevant and it's being affected by planner issues?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Strange query stalls on replica in 9.3.9

От
Jeff Janes
Дата:
On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus <josh@agliodbs.com> wrote:

On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the

> Once the commit of the whole-table update has replayed, the problem
> should go way instantly because at that point each backend doing the
> seqscan will find the the transaction has committed and so will set the
> hint bit that means all of the other seqscan backends that come after it
> can skip the proc array scan for that tuple.

Yes ... and given that the commit on the master took < 3 seconds, it's
not likely to take 30 seconds on the replica.  That aside, the pattern
of behavior does look similar to the planner issue.

Another thought.  Who actually sets the hint bits on a replica?  

Do the read-only processes on the replica which discovers a tuple to have been securely committed set the hint bits?

My benchmarking suggests not.

Or does it wait for the hint bits to get set on master, and then for a checkpoint to occur on the master, and then for that page to get changed again and FPW to the log, and then for the log to get replayed?  If so, that explains why the issue doesn't clear up on the replica immediately after the commit gets replayed.

 

> So perhaps the commit of the whole-table update is delayed because the
> startup process as also getting bogged down on the same contended lock?
> I don't know how hard WAL replay hits the proc array lock.

I don't know; we don't have any visibility into the replay process, and
no way to tell if replay is waiting on some kind of lock.  A regular
UPDATE should not block against any select activity on the replay, though.

Also, why would this affect *only* the query which does seq scans?  Is
there some difference between seqscan and index scan here, or is it
simply because they take longer, and since this issue is timing-based,
they're more likely to be hit?

An index scan only has to check the commit status of rows which meet the index quals, which is presumably a small fraction of the rows.

A seq scan checks the visibility of every row first, before checking the where clause.

Cheers,

Jeff

Re: Strange query stalls on replica in 9.3.9

От
Jeff Janes
Дата:

On Fri, Aug 14, 2015 at 9:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus <josh@agliodbs.com> wrote:

On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the

> Once the commit of the whole-table update has replayed, the problem
> should go way instantly because at that point each backend doing the
> seqscan will find the the transaction has committed and so will set the
> hint bit that means all of the other seqscan backends that come after it
> can skip the proc array scan for that tuple.

Yes ... and given that the commit on the master took < 3 seconds, it's
not likely to take 30 seconds on the replica.  That aside, the pattern
of behavior does look similar to the planner issue.

Another thought.  Who actually sets the hint bits on a replica?  

Do the read-only processes on the replica which discovers a tuple to have been securely committed set the hint bits?

My benchmarking suggests not.

The hint bits only get set if the commit lsn of the transaction of the tuple being hinted (*not* the page lsn) thinks it has already been flushed to WAL. On master the transaction commit record usually would have already flushed its own WAL, or if async then wal writer is going to take care of this fairly soon if nothing else gets to it first.

On the standby, it looks like the only thing that updates the thinks-it-has-been-flushed-to marker (which is stored in the control file, rather than memory) is either the eviction of a dirty buffer, or the completion of a restartpoint.  I could easily be wrong on that, though.  

In any case, you empirically can have committed but unhintable tuples hanging around for prolonged amounts of time on the standby.  Perhaps standbys need a wal writer process.

Cheers,

Jeff