Обсуждение: [GENERAL] Multixact members limit exceeded

Поиск
Список
Период
Сортировка

[GENERAL] Multixact members limit exceeded

От
Peter Hunčár
Дата:
Hello

We have a fairly large postgresql-9.4  database (currently 70TB and growing to approx 80TB ) running on Centos 7. 
The HW is 48 core Xeon with 180GB of RAM with data on a enterprise grade SAN storage.

We started feeding it several weeks ago and everything went smoothly until we hit this issue:

2017-08-09 05:21:50.946 WIB >DETAIL:  This command would create a multixact with 2 members, but the remaining space is only enough for 0 members.
2017-08-09 05:21:50.946 WIB >HINT:  Execute a database-wide VACUUM in database with OID 20101 with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.

I did what the hint proposes, but on a such large database a vacuum takes several days.
We are currently in data loading phase and we are sending only INSERT statements, there should be very little UPDATEs or DELETEs. 
Yesterday, the message disappeared shortly, but today it's back (vacuum is still running)

Is there a way how to prevent/fix this so we can finish the loading (97% done), because the performance went down from 100 ops/sec to 15ops/min.

Most tables have around 150 M rows with toast data. 
There are several huge tables with toast data, currently autovacuumed, I guess this is the reason for the performance drop:

 | usename  | application_name | state  | backend_xmin |                                                       query
-+----------+------------------+--------+--------------+----------------------------------------------------------------------
 | postgres |                  | active |   1683428686 | autovacuum: VACUUM pg_toast.pg_toast_14548803 (to prevent wraparound)
 | postgres |                  | active |   1683428693 | autovacuum: VACUUM pg_toast.pg_toast_14548821 (to prevent wraparound)
 | postgres |                  | active |   1683428705 | autovacuum: VACUUM pg_toast.pg_toast_14548828 (to prevent wraparound)
 | postgres |                  | active |   1683428719 | autovacuum: VACUUM pg_toast.pg_toast_14548835 (to prevent wraparound)
 | postgres |                  | active |   1683428732 | autovacuum: VACUUM pg_toast.pg_toast_14549150 (to prevent wraparound)

After changing the vacuum_cost_limit to 10000 for one night, I saw 200MB/s of writes the whole night, but I had to change it back to 2000, because the insert perf went to 0.

The autovacuumed tables are mostly toast tables of those:

SELECT relname, age(relminmxid) as mxid_age, pg_size_pretty(pg_table_size(oid)) as table_size 
FROM pg_class
WHERE relkind = 'r' and pg_table_size(oid) > 100000000
ORDER BY age(relminmxid) DESC LIMIT 20;

           relname            |  mxid_age  | table_size
------------------------------+------------+------------
 t1                           | 1554084722 | 172 GB
 t2                           | 1554084722 | 10 TB
 t3                           | 1554084722 | 21 GB
 t4                           | 1554084722 | 8928 MB
 t5                           | 1554084722 | 43 GB
 t6                           | 1554084722 | 2788 GB
 t7                           | 1554084722 | 290 GB
 t8                           | 1554084722 | 11 GB
 t9                           | 1554084722 | 2795 GB
 t10                          | 1554084722 | 20 GB
 t11                          | 1554084722 | 16 TB

There is a large number of member files in pg_multixact/members

-bash-4.2$ ll ../data/pg_multixact/members/|wc -l
82041
With the oldes file from the beginning of July.

pg_controldata output:

-bash-4.2$ pg_controldata /var/lib/pgsql/9.4/data/
pg_control version number:            942
Catalog version number:               201409291
Database system identifier:           6421090142329829830
Database cluster state:               in production
pg_control last modified:             Wed 09 Aug 2017 03:48:47 PM WIB
Latest checkpoint location:           589E/9DB366A0
Prior checkpoint location:            589E/95E8DEE0
Latest checkpoint's REDO location:    589E/963FC1B8
Latest checkpoint's REDO WAL file:    000000010000589E00000096
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1683422006
Latest checkpoint's NextOID:          2293738628
Latest checkpoint's NextMultiXactId:  129346573
Latest checkpoint's NextMultiOffset:  4294967295
Latest checkpoint's oldestXID:        784247712
Latest checkpoint's oldestXID's DB:   20101
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   11604
Latest checkpoint's oldestMulti's DB: 20101
Time of latest checkpoint:            Wed 09 Aug 2017 03:34:33 PM WIB
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            minimal
Current wal_log_hints setting:        off
Current max_connections setting:      1200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

Thank you

Peter Huncar

Re: [GENERAL] Multixact members limit exceeded

От
Andres Freund
Дата:
Hi,

On 2017-08-09 10:06:48 +0000, Peter Hunčár wrote:
> We started feeding it several weeks ago and everything went smoothly until
> we hit this issue:
>
> 2017-08-09 05:21:50.946 WIB >DETAIL:  This command would create a multixact
> with 2 members, but the remaining space is only enough for 0 members.
> 2017-08-09 05:21:50.946 WIB >HINT:  Execute a database-wide VACUUM in
> database with OID 20101 with reduced vacuum_multixact_freeze_min_age and
> vacuum_multixact_freeze_table_age settings.

Ugh, that's not good.


> I did what the hint proposes, but on a such large database a vacuum takes
> several days.
> We are currently in data loading phase and we are sending only INSERT
> statements, there should be very little UPDATEs or DELETEs.
> Yesterday, the message disappeared shortly, but today it's back (vacuum is
> still running)

I suggest manually vacuuming the tables with the oldest xids first, that
ought to move the limits ahead. A query like
SELECT oid::regclass, relminmxid, mxid_age(relminmxid) FROM pg_class WHERE relminmxid <> '0' ORDER BY
mxid_age(relminmxid)DESC; 
might help you make that determination.


> *Is there a way how to prevent/fix this so we can finish the loading (97%
> done), because the performance went down from 100 ops/sec to 15ops/min.*
>
> Most tables have around 150 M rows with toast data.
> There are several huge tables with toast data, currently autovacuumed, I
> guess this is the reason for the performance drop:
>
>  | usename  | application_name | state  | backend_xmin |
>                                     query
>
-+----------+------------------+--------+--------------+----------------------------------------------------------------------
>  | postgres |                  | active |   1683428686 | autovacuum: VACUUM
> pg_toast.pg_toast_14548803 (to prevent wraparound)
>  | postgres |                  | active |   1683428693 | autovacuum: VACUUM
> pg_toast.pg_toast_14548821 (to prevent wraparound)
>  | postgres |                  | active |   1683428705 | autovacuum: VACUUM
> pg_toast.pg_toast_14548828 (to prevent wraparound)
>  | postgres |                  | active |   1683428719 | autovacuum: VACUUM
> pg_toast.pg_toast_14548835 (to prevent wraparound)
>  | postgres |                  | active |   1683428732 | autovacuum: VACUUM
> pg_toast.pg_toast_14549150 (to prevent wraparound)

If these are the most "urgent" tables from the query above, I'd let
autovac finish, otherwise I'd cancel them and manually choose the most
urgent ones.


> After changing the vacuum_cost_limit to 10000 for one night, I saw 200MB/s
> of writes the whole night, but I had to change it back to 2000, because the
> insert perf went to 0.

You have a fair amount of autovacuum workers, perhaps that's also a bit
too much?  I'd consider making autovacuum's cleanup duties *less*
aggressive, but trying to keep the rate of cleanup high.  E.g. by
increasing vacuum_freeze_min_age and vacuum_multixact_freeze_min_age, so
only urgently old stuff gets cleaned up.


Do you know how you come to have a lot of large multixacts?  That's
often indicative of way too many sessions trying to lock the same rows
and such, might be worthwhile to work on that too, to reduce the
frequency at which new ones are needed.


Greetings,

Andres Freund


Re: [GENERAL] Multixact members limit exceeded

От
Peter Hunčár
Дата:
Hi,

Thank you, yes those are the 'urgent' tables, I'd talk to the developers regarding the locks.I too think, there's something 'fishy' going on.

Anyway, could it be that autovacuum blocks manual vacuum? Because I ran vacuum (full, verbose) and some tables finished quite fast, with huge amount of io recorded in the monitoring, but some of them are kind of stuck? 
Which brings me to the second question, how can I cancel autovacuum?

And the last thing I don't understand:

One particular table before vacuum full:

           relname            | relminmxid | table_size
------------------------------+------------+------------
 delayed_jobs                 | 1554151198 | 21 GB

And after vacuum full:
 
   relname    | relminmxid | table_size
--------------+------------+------------
 delayed_jobs | 1554155465 | 6899 MB

Shouldn't be the relminmxid changed after vacuum full, or am I not understanding something?

Thanks a lot


On Wed, Aug 9, 2017 at 7:57 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-08-09 10:06:48 +0000, Peter Hunčár wrote:
> We started feeding it several weeks ago and everything went smoothly until
> we hit this issue:
>
> 2017-08-09 05:21:50.946 WIB >DETAIL:  This command would create a multixact
> with 2 members, but the remaining space is only enough for 0 members.
> 2017-08-09 05:21:50.946 WIB >HINT:  Execute a database-wide VACUUM in
> database with OID 20101 with reduced vacuum_multixact_freeze_min_age and
> vacuum_multixact_freeze_table_age settings.

Ugh, that's not good.


> I did what the hint proposes, but on a such large database a vacuum takes
> several days.
> We are currently in data loading phase and we are sending only INSERT
> statements, there should be very little UPDATEs or DELETEs.
> Yesterday, the message disappeared shortly, but today it's back (vacuum is
> still running)

I suggest manually vacuuming the tables with the oldest xids first, that
ought to move the limits ahead. A query like
SELECT oid::regclass, relminmxid, mxid_age(relminmxid) FROM pg_class WHERE relminmxid <> '0' ORDER BY mxid_age(relminmxid) DESC;
might help you make that determination.


> *Is there a way how to prevent/fix this so we can finish the loading (97%
> done), because the performance went down from 100 ops/sec to 15ops/min.*
>
> Most tables have around 150 M rows with toast data.
> There are several huge tables with toast data, currently autovacuumed, I
> guess this is the reason for the performance drop:
>
>  | usename  | application_name | state  | backend_xmin |
>                                     query
> -+----------+------------------+--------+--------------+----------------------------------------------------------------------
>  | postgres |                  | active |   1683428686 | autovacuum: VACUUM
> pg_toast.pg_toast_14548803 (to prevent wraparound)
>  | postgres |                  | active |   1683428693 | autovacuum: VACUUM
> pg_toast.pg_toast_14548821 (to prevent wraparound)
>  | postgres |                  | active |   1683428705 | autovacuum: VACUUM
> pg_toast.pg_toast_14548828 (to prevent wraparound)
>  | postgres |                  | active |   1683428719 | autovacuum: VACUUM
> pg_toast.pg_toast_14548835 (to prevent wraparound)
>  | postgres |                  | active |   1683428732 | autovacuum: VACUUM
> pg_toast.pg_toast_14549150 (to prevent wraparound)

If these are the most "urgent" tables from the query above, I'd let
autovac finish, otherwise I'd cancel them and manually choose the most
urgent ones.


> After changing the vacuum_cost_limit to 10000 for one night, I saw 200MB/s
> of writes the whole night, but I had to change it back to 2000, because the
> insert perf went to 0.

You have a fair amount of autovacuum workers, perhaps that's also a bit
too much?  I'd consider making autovacuum's cleanup duties *less*
aggressive, but trying to keep the rate of cleanup high.  E.g. by
increasing vacuum_freeze_min_age and vacuum_multixact_freeze_min_age, so
only urgently old stuff gets cleaned up.


Do you know how you come to have a lot of large multixacts?  That's
often indicative of way too many sessions trying to lock the same rows
and such, might be worthwhile to work on that too, to reduce the
frequency at which new ones are needed.


Greetings,

Andres Freund

Re: [GENERAL] Multixact members limit exceeded

От
Alvaro Herrera
Дата:
Peter Hunčár wrote:
> Hi,
>
> Thank you, yes those are the 'urgent' tables, I'd talk to the developers
> regarding the locks.I too think, there's something 'fishy' going on.

I bet you have a lot of subtransactions -- maybe a plpgsql block with an
EXCEPTION clause that's doing something per-tuple.  In certain cases
in 9.3 and 9.4 that can eat a lot of multixacts.  9.5 is much better in
that regard -- I suggest considering an upgrade there.

> Anyway, could it be that autovacuum blocks manual vacuum? Because I ran
> vacuum (full, verbose) and some tables finished quite fast, with huge
> amount of io recorded in the monitoring, but some of them are kind of
> stuck?

Whenever autovacuum is marked "for wraparound", it'll block a manual
vacuum.  An autovacuum worker not so marked would get cancelled by the
manual vacuum.

> Which brings me to the second question, how can I cancel autovacuum?

pg_cancel_backend() should do it, regardless of whether it's for
wraparound or not (but if it is, autovacuum will launch another worker
for the same table quickly afterwards).

> One particular table before vacuum full:
>
>            relname            | relminmxid | table_size
> ------------------------------+------------+------------
>  delayed_jobs                 | 1554151198 | 21 GB
>
> And after vacuum full:
>
>    relname    | relminmxid | table_size
> --------------+------------+------------
>  delayed_jobs | 1554155465 | 6899 MB
>
> Shouldn't be the relminmxid changed after vacuum full, or am I not
> understanding something?

But it did change ... the problem is that it didn't change enough (only
4000 multixacts).  Maybe your multixact freeze min age is too high?
Getting rid of 15 GB of bloat is a good side effect, though, I'm sure.

What are the freeze settings?
select name, setting from pg_settings where name like '%vacuum%';

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] Multixact members limit exceeded

От
Andres Freund
Дата:
Hi,

On 2017-08-09 16:30:03 -0400, Alvaro Herrera wrote:

> > One particular table before vacuum full:
> >
> >            relname            | relminmxid | table_size
> > ------------------------------+------------+------------
> >  delayed_jobs                 | 1554151198 | 21 GB
> >
> > And after vacuum full:
> >
> >    relname    | relminmxid | table_size
> > --------------+------------+------------
> >  delayed_jobs | 1554155465 | 6899 MB
> >
> > Shouldn't be the relminmxid changed after vacuum full, or am I not
> > understanding something?

FWIW, VACUUM FULL isn't a good tool to use here. It's commonly way more
expensive than a normal vacuum (it rewrites the entire table, and
rebuilds indexes).


> But it did change ... the problem is that it didn't change enough (only
> 4000 multixacts).  Maybe your multixact freeze min age is too high?
> Getting rid of 15 GB of bloat is a good side effect, though, I'm sure.

I wonder if there's longrunning transactions preventing cleanup. I
suggest checking pg_stat_activity, pg_prepared_xacts,
pg_replication_slot for older stuff.

Greetings,

Andres Freund


Re: [GENERAL] Multixact members limit exceeded

От
Peter Hunčár
Дата:
Hello,

there are currently no transactions whatsoever, the app is paused. I can even restart the database if needed.
I ran vacuum full, because as I mentioned above it seemed to me that manual vacuum did not change the relminmxid of a table.
Unfortunately, an upgrade is not an option :(

                name                 |  setting               
-------------------------------------+-----------             
 autovacuum                          | on                     
 autovacuum_analyze_scale_factor     | 0.1                    
 autovacuum_analyze_threshold        | 50                     
 autovacuum_freeze_max_age           | 200000000              
 autovacuum_max_workers              | 3                      
 autovacuum_multixact_freeze_max_age | 400000000              
 autovacuum_naptime                  | 60                     
 autovacuum_vacuum_cost_delay        | 10                     
 autovacuum_vacuum_cost_limit        | -1                     
 autovacuum_vacuum_scale_factor      | 0.2                    
 autovacuum_vacuum_threshold         | 50                     
 autovacuum_work_mem                 | -1                     
 log_autovacuum_min_duration         | -1                     
 vacuum_cost_delay                   | 0                      
 vacuum_cost_limit                   | 2000                  
 vacuum_cost_page_dirty              | 20                     
 vacuum_cost_page_hit                | 1                      
 vacuum_cost_page_miss               | 10                     
 vacuum_defer_cleanup_age            | 0                      
 vacuum_freeze_min_age               | 50000000               
 vacuum_freeze_table_age             | 150000000              
 vacuum_multixact_freeze_min_age     | 3000000                
 vacuum_multixact_freeze_table_age   | 140000000              

The default vacuum_multixact_freeze_min_age is 5M according to the doc, but according to the documentation, it's limited silently to half of autovacuum_multixact_freeze_max_age
So I guess I have to get those 400M to much lower number? 

Thank you very much

Peter

On Wed, Aug 9, 2017 at 10:39 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-08-09 16:30:03 -0400, Alvaro Herrera wrote:

> > One particular table before vacuum full:
> >
> >            relname            | relminmxid | table_size
> > ------------------------------+------------+------------
> >  delayed_jobs                 | 1554151198 | 21 GB
> >
> > And after vacuum full:
> >
> >    relname    | relminmxid | table_size
> > --------------+------------+------------
> >  delayed_jobs | 1554155465 | 6899 MB
> >
> > Shouldn't be the relminmxid changed after vacuum full, or am I not
> > understanding something?

FWIW, VACUUM FULL isn't a good tool to use here. It's commonly way more
expensive than a normal vacuum (it rewrites the entire table, and
rebuilds indexes).


> But it did change ... the problem is that it didn't change enough (only
> 4000 multixacts).  Maybe your multixact freeze min age is too high?
> Getting rid of 15 GB of bloat is a good side effect, though, I'm sure.

I wonder if there's longrunning transactions preventing cleanup. I
suggest checking pg_stat_activity, pg_prepared_xacts,
pg_replication_slot for older stuff.

Greetings,

Andres Freund

Re: [GENERAL] Multixact members limit exceeded

От
Thomas Munro
Дата:
On Wed, Aug 9, 2017 at 10:06 PM, Peter Hunčár <hunci@hunci.sk> wrote:
> SELECT relname, age(relminmxid) as mxid_age,

I'm pretty sure you can't use age(xid) to compute the age of a
multixact ID.  Although they have type xid in pg_class and the
age(xid) function will happily subtract your multixact ID from the
next xid, the answer is meaningless.  mxid_age is the function you
want, though I can't remember if it's present in 9.4.

On Thu, Aug 10, 2017 at 7:31 AM, Peter Hunčár <hunci@hunci.sk> wrote:
> Thank you, yes those are the 'urgent' tables, I'd talk to the developers
> regarding the locks.I too think, there's something 'fishy' going on.
>
> Anyway, could it be that autovacuum blocks manual vacuum? Because I ran
> vacuum (full, verbose) and some tables finished quite fast, with huge amount
> of io recorded in the monitoring, but some of them are kind of stuck?
> Which brings me to the second question, how can I cancel autovacuum?
>
> And the last thing I don't understand:
>
> One particular table before vacuum full:
>
>            relname            | relminmxid | table_size
> ------------------------------+------------+------------
>  delayed_jobs                 | 1554151198 | 21 GB
>
> And after vacuum full:
>
>    relname    | relminmxid | table_size
> --------------+------------+------------
>  delayed_jobs | 1554155465 | 6899 MB
>
> Shouldn't be the relminmxid changed after vacuum full, or am I not
> understanding something?

How can delayed_jobs have relminmxid = 1,554,155,465?  According to
your pg_controldata output, the active range of multixact IDs in your
cluster is 11,604 -> 129,346,573, so there is some confusion here.
Perhaps this is the output of an expression involving
age(multixactid)?

One thing I noticed is that there are ~4 billion members (that's how
many you have when you run out of member space), but only ~128 million
multixacts, so I think the average multixact has ~32 members.
Considering the way that multixacts grow by copying and extending by
one, by the time you've created a multixact with n members you've
eaten a total of n! member space with an average size of n/2 per
multixact...  So one way to hit that average would be to repeatedly
build ~64 member multixacts, or if mixed with smaller ones then you'd
need to be intermittently building even larger ones.  A thundering
herd of worker processes repeatedly share-locking the same row or
something like that?

--
Thomas Munro
http://www.enterprisedb.com


Re: [GENERAL] Multixact members limit exceeded

От
Thomas Munro
Дата:
On Thu, Aug 10, 2017 at 10:26 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> eaten a total of n! member space with an average size of n/2 per

Erm, math fail, not n! but 1 + 2 + ... + n.

--
Thomas Munro
http://www.enterprisedb.com


Re: [GENERAL] Multixact members limit exceeded

От
Alvaro Herrera
Дата:
Thomas Munro wrote:

> One thing I noticed is that there are ~4 billion members (that's how
> many you have when you run out of member space), but only ~128 million
> multixacts, so I think the average multixact has ~32 members.
> Considering the way that multixacts grow by copying and extending by
> one, by the time you've created a multixact with n members you've
> eaten a total of n! member space with an average size of n/2 per
> multixact...  So one way to hit that average would be to repeatedly
> build ~64 member multixacts, or if mixed with smaller ones then you'd
> need to be intermittently building even larger ones.  A thundering
> herd of worker processes repeatedly share-locking the same row or
> something like that?

Note that the behavior of 9.5 is quite different from that of 9.3/9.4;
in the older releases there is some nasty behavior involving multiple
subtransactions of the same transaction grabbing a lock on the same
tuple, whereby the multixact grows without bound.  In 9.5 we fixed that.
There's a patch in the thread for bug #8470 that fixes it for 9.3 and
probably it can be applied to 9.4 too, not sure.  You can see it here:
https://www.postgresql.org/message-id/20150410161704.GH4369@alvh.no-ip.org
Look for the "simple" formulation.  I don't think I posted an updated
version later, but it was never applied.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] Multixact members limit exceeded

От
Peter Hunčár
Дата:
Hello

it took 10 days to autovacuum the 32TB toast table and after restarting with previous (but slightly tuned autovacuum) parameters, the database works just fine.
The data import is almost done, there are just several hundred thousands of rows to add :)

The next headache will the the backup of those 110TB ;) 

Thanks a lot for the help!

Peter