Обсуждение: Lock contention in TransactionIdIsInProgress()

Поиск
Список
Период
Сортировка

Lock contention in TransactionIdIsInProgress()

От
bock@openit.de (Julian v. Bock)
Дата:
Hi

I have been investigating a problem with one of our production databases
where every few weeks the database suddenly came to a grinding
halt. When this happened, queries needed much more time to complete and
just opening a local db connection with psql could take up to 30
seconds.

The PostgreSQL version is 9.2.13.

The database has a central table (about 60GB including indexes) with
records which can be searched via a web interface. The table is a
materialized view that is updated via several custom triggers on other
tables. When data is changed, all relevant record are DELETEd and then
INSERTed from a view.

The table is replicated via slony1 (2.1.4). All search queries are
executed on a slave db. The databases run on Xeon servers with 24 cores
and 96GB of ram.

I managed to reproduce the problem using the following steps:

- Perform an update on the master server which causes about 50.000 rows
  to be repeatedly deleted and inserted in a single transaction. For
  testing I did this 20 times for about a million deletes and inserts.(*)

- Slony starts to replicate the individual deletes and inserts to the
  slave db.

- After this has been running for a few minutes, I run a particular
  query(**) on the slave repeatedly in up to 20 psql connections. For
  each query a new connection is opened.

- After a few seconds the backend processes go up to almost 100% cpu
  usage and take a long time to complete. When i run psql on the console
  it takes anywhere from 0s (almost immediately) to tens of seconds to
  get to the prompt. (The average time gets worse the longer the
  replication is running.)

After doing some debugging it seems that the backend processes are
spending most of their time in spinlocks in TransactionIdIsInProgress()
trying to get a lock on ProcArrayLock. This function is called more
often (when processing searches) the longer the replication transaction
is running.

TransactionIdIsInProgress() conveniently has some debug counters
enabled via #defining XIDCACHE_DEBUG. When the backend processes
processing the searches finish their transaction, the xc_by_main_xid
counter is about the same for each search. The longer the slony
replication transaction is running, the higher the number gets.

Serious slowdowns start at about 100.000 and I had it get up to more
than 2.000.000 (not sure whether that corresponds to the number of
deleted / inserted tuples).


Is this a known problem?

Unfortunately, I did not yet manage to create a self contained
example. Just copying the table to another database on the slave and
deleting / inserting the tuples without slony did not reproduce the
issue. I am not sure whether this is due to reduced bloat in the table
after pg_dump / pg_restore or whether it is relevant that slony deletes
and inserts the rows individually (or something entirely different).


(*) On the production server a poor interaction between triggers and the
way the application updates data caused the rows to be deleted /
inserted many times. Fortunately, we could work around this so there
should be no immediate danger that this comes up on the live db again.

(**) The search query I used to reproduce the problem does not look
particularly nasty and returns about 260 rows in a few hundred
milliseconds under normal circumstances. It does no full table scan but
uses a few indexes.

The returned rows are among the rows that get deleted and inserted
repeatedly.


Regards,
Julian v. Bock

--
Julian v. Bock               Projektleitung Software-Entwicklung
OpenIT GmbH                  Tel +49 211 239 577-0
In der Steele 33a-41         Fax +49 211 239 577-10
D-40599 Düsseldorf           http://www.openit.de
________________________________________________________________
HRB 38815 Amtsgericht Düsseldorf             USt-Id DE 812951861
Geschäftsführer: Oliver Haakert, Maurice Kemmann


Re: Lock contention in TransactionIdIsInProgress()

От
Jeff Janes
Дата:
On Thu, Nov 5, 2015 at 10:38 AM, Julian v. Bock <bock@openit.de> wrote:

> After doing some debugging it seems that the backend processes are
> spending most of their time in spinlocks in TransactionIdIsInProgress()
> trying to get a lock on ProcArrayLock. This function is called more
> often (when processing searches) the longer the replication transaction
> is running.

This is a known issue and is fixed for 9.6 in commit 8a7d0701814a4e.

I don' t know why you can't reproduce it without Slony, though.  I
know that hot standby gets bit by this problem particularly hard
because it sometimes refrains from setting hint bits as aggressively
as it could.  I would not think that Slony would have that problem,
though.

Cheers,

Jeff


Re: Lock contention in TransactionIdIsInProgress()

От
bock@openit.de (Julian v. Bock)
Дата:
Hi

>>>>> "JJ" == Jeff Janes <jeff.janes@gmail.com> writes:

    JJ> This is a known issue and is fixed for 9.6 in commit
    JJ> 8a7d0701814a4e.

thanks for the quick reply. Is there a chance this will get into the 9.5
branch as well?

Regards,
Julian v. Bock

--
Julian v. Bock               Projektleitung Software-Entwicklung
OpenIT GmbH                  Tel +49 211 239 577-0
In der Steele 33a-41         Fax +49 211 239 577-10
D-40599 Düsseldorf           http://www.openit.de
________________________________________________________________
HRB 38815 Amtsgericht Düsseldorf             USt-Id DE 812951861
Geschäftsführer: Oliver Haakert, Maurice Kemmann


Re: Lock contention in TransactionIdIsInProgress()

От
Tomáš Uko
Дата:

Hi Jeff,

 

We have encountered same problem as you (in 9.5.4), it seems that so far it hasn’t been ported back from 9.6, but if you take this commit and apply it to 9.5 source codes, it seems to be working

 

But anyway, does anybody knows plans about backporting this to 9.5 official way?

 

Regards

Tomas

Re: Lock contention in TransactionIdIsInProgress()

От
Merlin Moncure
Дата:
On Wed, Oct 5, 2016 at 5:38 AM, Tomáš Uko <uko@avast.com> wrote:
> Hi Jeff,
>
>
>
> We have encountered same problem as you (in 9.5.4), it seems that so far it
> hasn’t been ported back from 9.6, but if you take this commit and apply it
> to 9.5 source codes, it seems to be working
>
> But anyway, does anybody knows plans about backporting this to 9.5 official
> way?

Won't happen. Only bugs get ported back, and performance enhancements
generally don't qualify (in very exceptional cases they might but this
issue is a routine performance enhancement).  Basically you have to
decide to update or stay with 9.5.

merlin


Re: Lock contention in TransactionIdIsInProgress()

От
Michael Paquier
Дата:
On Fri, Oct 7, 2016 at 10:36 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Oct 5, 2016 at 5:38 AM, Tomáš Uko <uko@avast.com> wrote:
>> Hi Jeff,
>>
>>
>>
>> We have encountered same problem as you (in 9.5.4), it seems that so far it
>> hasn’t been ported back from 9.6, but if you take this commit and apply it
>> to 9.5 source codes, it seems to be working
>>
>> But anyway, does anybody knows plans about backporting this to 9.5 official
>> way?
>
> Won't happen. Only bugs get ported back, and performance enhancements
> generally don't qualify (in very exceptional cases they might but this
> issue is a routine performance enhancement).  Basically you have to
> decide to update or stay with 9.5.

Another possibility is to patch and build yourself Postgres if you are
still willing to use that with a given version. But as that's actually
forking the upstream version, it won't be supported by community.
--
Michael


Re: Lock contention in TransactionIdIsInProgress()

От
Andres Freund
Дата:
On 2016-10-07 08:36:12 -0500, Merlin Moncure wrote:
> Won't happen. Only bugs get ported back

Hopefully we don't port bugs back all that often.