Обсуждение: huge SubtransSLRU and SubtransBuffer wait_event

Поиск
Список
Период
Сортировка

huge SubtransSLRU and SubtransBuffer wait_event

От
"James Pang (chaolpan)"
Дата:

Hi,

   We have a Postgresqlv14.8 server, client use Postgresql JDBC connections, today, our server see a lot of  “SubtransBuffer” and “SubtransSLRU” wait_event. Could you help direct me what’s the possible cause and how to resolve this waits ?

 

Thanks,

 

James

Re: huge SubtransSLRU and SubtransBuffer wait_event

От
Laurenz Albe
Дата:
On Thu, 2024-02-01 at 11:50 +0000, James Pang (chaolpan) wrote:
> We have a Postgresqlv14.8 server, client use Postgresql JDBC connections, today,
> our server see a lot of  “SubtransBuffer” and “SubtransSLRU” wait_event.
> Could you help direct me what’s the possible cause and how to resolve this waits ?

Today, the only feasible solution is not to create more than 64 subtransactions
(savepoints or PL/pgSQL EXCEPTION clauses) per transaction.

Don't use extensions or the JDBC driver option to simulate statement level rollback,
that is the road to hell.

Yours,
Laurenz Albe



RE: huge SubtransSLRU and SubtransBuffer wait_event

От
"James Pang (chaolpan)"
Дата:
Today, the only feasible solution is not to create more than 64 subtransactions (savepoints or PL/pgSQL EXCEPTION
clauses)per transaction.
 

Don't use extensions or the JDBC driver option to simulate statement level rollback, that is the road to hell.
    You mean extensions to simulate a subtransaction like pg_background ?  for JDBC driver option to simulate statement
levelrollback, could you share more details ? 
 

Thanks,

James

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at> 
Sent: Thursday, February 1, 2024 8:42 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-performance@lists.postgresql.org
Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event

On Thu, 2024-02-01 at 11:50 +0000, James Pang (chaolpan) wrote:
> We have a Postgresqlv14.8 server, client use Postgresql JDBC 
> connections, today, our server see a lot of  “SubtransBuffer” and “SubtransSLRU” wait_event.
> Could you help direct me what’s the possible cause and how to resolve this waits ?

Today, the only feasible solution is not to create more than 64 subtransactions (savepoints or PL/pgSQL EXCEPTION
clauses)per transaction.
 

Don't use extensions or the JDBC driver option to simulate statement level rollback, that is the road to hell.

Yours,
Laurenz Albe

RE: huge SubtransSLRU and SubtransBuffer wait_event

От
"James Pang (chaolpan)"
Дата:
Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update table xxxx;commit);   2) application JDBC client
callprocedure1 (it's a long running job, sometimes it could last > 1hours).   During this time window,  other
PostgresqlJDBC clients (100-200) coming in in same time , then quickly see MultiXactoffset and SubtransSLRU increased
veryquickly. 
 
  Possible to increase  Subtrans SLRU buffer size ?    PL/PGSQL proc1--> procedure2(updates table) it use
substransationin procedure2 ,right? 
 

Thanks,

James

-----Original Message-----
From: James Pang (chaolpan) 
Sent: Thursday, February 1, 2024 11:34 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org
Subject: RE: huge SubtransSLRU and SubtransBuffer wait_event

Today, the only feasible solution is not to create more than 64 subtransactions (savepoints or PL/pgSQL EXCEPTION
clauses)per transaction.
 

Don't use extensions or the JDBC driver option to simulate statement level rollback, that is the road to hell.
    You mean extensions to simulate a subtransaction like pg_background ?  for JDBC driver option to simulate statement
levelrollback, could you share more details ? 
 

Thanks,

James

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Thursday, February 1, 2024 8:42 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-performance@lists.postgresql.org
Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event

On Thu, 2024-02-01 at 11:50 +0000, James Pang (chaolpan) wrote:
> We have a Postgresqlv14.8 server, client use Postgresql JDBC 
> connections, today, our server see a lot of  “SubtransBuffer” and “SubtransSLRU” wait_event.
> Could you help direct me what’s the possible cause and how to resolve this waits ?

Today, the only feasible solution is not to create more than 64 subtransactions (savepoints or PL/pgSQL EXCEPTION
clauses)per transaction.
 

Don't use extensions or the JDBC driver option to simulate statement level rollback, that is the road to hell.

Yours,
Laurenz Albe

Re: huge SubtransSLRU and SubtransBuffer wait_event

От
Alvaro Herrera
Дата:
On 2024-Feb-02, James Pang (chaolpan) wrote:

>   Possible to increase  Subtrans SLRU buffer size ?

Not at present -- you need to recompile after changing
NUM_SUBTRANS_BUFFERS in src/include/access/subtrans.h,
NUM_MULTIXACTOFFSET_BUFFERS and NUM_MULTIXACTMEMBER_BUFFERS in
src/include/access/multixact.h.

There's pending work to let these be configurable in version 17.

>   Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update
>   table xxxx;commit);   2) application JDBC client call procedure1
>   (it's a long running job, sometimes it could last > 1hours).
>   During this time window,  other Postgresql JDBC clients (100-200)
>   coming in in same time , then quickly see MultiXactoffset and
>   SubtransSLRU increased very quickly. 
>   PL/PGSQL proc1--> procedure2(updates table) it use substransation in
>   procedure2 ,right? 

If your functions/procedures use EXCEPTION clauses, that would create
subtransactions also.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)



Re: huge SubtransSLRU and SubtransBuffer wait_event

От
Lars Aksel Opsahl
Дата:

>From: James Pang (chaolpan) <chaolpan@cisco.com>Sent: Friday, February 2, 2024 7:47 AMTo: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>Subject: RE: huge SubtransSLRU and SubtransBuffer wait_event

>

>  Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update table xxxx;commit);   2) application JDBC client call procedure1 (it's a long running job, sometimes it could last > 1hours).   During this time window,  other Postgresql JDBC clients (100-200) coming in in same time , then quickly see MultiXactoffset and SubtransSLRU increased very quickly.


Hi


We had the same problem here https://gitlab.com/nibioopensource/resolve-overlap-and-gap . Here we can have more than 50 threads pushing millions of rows into common tables and one single final Postgis Topology structure as a final step. We also need to run try catch. The code is wrapped into functions and procedures and called from psql .


Just to test we tried compile with a higher number of subtrans locks and that just made this problem appear just a little bit later.


For us the solution was to save temporary results in array like this https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/commit/679bea2b4b1ba4c9e84923b65c62c32c3aed6c21#a22cbe80eb0e36ea21e4f8036e0a4109b2ff2379_611_617

. The clue is to do as much work as possible without involving any common data structures for instance like using arrays to hold temp results and not use a shared final table before it's really needed.


Then later at a final step we insert all prepared data into a final common data structure and where we also try to avoid try catch when possible. Then system can then run with verry high CPU load for 99% of the work and just at then verry end we start to involve the common database structure.


Another thing to avoid locks is let each thread work on it's down data as much possible, this means breaking up the input and sort what's unique data for this tread and postpone the common data to a later stage. When for instance working with Postgis Topology we actually split data to be sure that not two threads works on the same area and then at later state another thread push shared data/area in to the final data structure.


This steps seems to have solved this problem for us which started out here https://postgrespro.com/list/thread-id/2478202


Lars

Re: huge SubtransSLRU and SubtransBuffer wait_event

От
Nikolay Samokhvalov
Дата:


On Thu, Feb 1, 2024 at 04:42 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Today, the only feasible solution is not to create more than 64 subtransactions
(savepoints or PL/pgSQL EXCEPTION clauses) per transaction.

Sometimes, a single subtransaction is enough to experience a bad SubtransSLRU spike: 

I think 64+ nesting level is quite rare, but this kind of problem that hits you when you have high XID growth (lots of writes) + long-running transaction is quite easy to bump into. Or this case involving MultiXactIDs: 

Nik

Re: huge SubtransSLRU and SubtransBuffer wait_event

От
Laurenz Albe
Дата:
On Fri, 2024-02-02 at 02:04 -0800, Nikolay Samokhvalov wrote:
> On Thu, Feb 1, 2024 at 04:42 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > Today, the only feasible solution is not to create more than 64 subtransactions
> > (savepoints or PL/pgSQL EXCEPTION clauses) per transaction.
>
> I think 64+ nesting level is quite rare

It doesn't have to be 64 *nested* subtransactions.  This is enough:

CREATE TABLE tab (x integer);

DO
$$DECLARE
   i integer;
BEGIN
   FOR i IN 1..70 LOOP
      BEGIN
         INSERT INTO tab VALUES (i);
      EXCEPTION
         WHEN unique_violation THEN
            NULL; -- ignore
      END;
   END LOOP;
END;$$;

Yours,
Laurenz Albe



RE: huge SubtransSLRU and SubtransBuffer wait_event

От
"James Pang (chaolpan)"
Дата:

   We finally identified the cause, a pl/pgsql procedure  proc1 (for 1…5000 loop  call proc2()); proc2 (begin ..exception..end); at the same time, more than 200 sessions coming in milliseconds and do same query during the “call proc1 long running transaction”.  The code change and cutdown the parallel sessions count doing same query at the same time help a lot.

   

   Thanks all.

 

James

 

From: Nikolay Samokhvalov <samokhvalov@gmail.com>
Sent: Friday, February 2, 2024 6:04 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org
Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event

 

 

 

On Thu, Feb 1, 2024 at 04:42 Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Today, the only feasible solution is not to create more than 64 subtransactions
(savepoints or PL/pgSQL EXCEPTION clauses) per transaction.

 

Sometimes, a single subtransaction is enough to experience a bad SubtransSLRU spike: 

 

I think 64+ nesting level is quite rare, but this kind of problem that hits you when you have high XID growth (lots of writes) + long-running transaction is quite easy to bump into. Or this case involving MultiXactIDs: 

 

Nik

Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

От
James Pang
Дата:

 

 From this link, looks like "onfigurable buffer pool and partitioning the SLRU lock" is one the plan,  maybe from v18,19 version,  https://www.postgresql.org/message-id/202402221843.ibzvpndbacbi@alvherre.pgsql


    James  

From: James Pang (chaolpan)
Sent: Tuesday, February 6, 2024 2:59 PM
To: Nikolay Samokhvalov <samokhvalov@gmail.com>; Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org
Subject: RE: huge SubtransSLRU and SubtransBuffer wait_event

 

   We finally identified the cause, a pl/pgsql procedure  proc1 (for 1…5000 loop  call proc2()); proc2 (begin ..exception..end); at the same time, more than 200 sessions coming in milliseconds and do same query during the “call proc1 long running transaction”.  The code change and cutdown the parallel sessions count doing same query at the same time help a lot.

   

   Thanks all.

 

James

 

From: Nikolay Samokhvalov <samokhvalov@gmail.com>
Sent: Friday, February 2, 2024 6:04 PM
To: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org
Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event

 

 

 

On Thu, Feb 1, 2024 at 04:42 Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Today, the only feasible solution is not to create more than 64 subtransactions
(savepoints or PL/pgSQL EXCEPTION clauses) per transaction.

 

Sometimes, a single subtransaction is enough to experience a bad SubtransSLRU spike: 

 

I think 64+ nesting level is quite rare, but this kind of problem that hits you when you have high XID growth (lots of writes) + long-running transaction is quite easy to bump into. Or this case involving MultiXactIDs: 

 

Nik

Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

От
James Pang
Дата:
>   Possible to increase  Subtrans SLRU buffer size ?

Not at present -- you need to recompile after changing NUM_SUBTRANS_BUFFERS in src/include/access/subtrans.h, NUM_MULTIXACTOFFSET_BUFFERS and NUM_MULTIXACTMEMBER_BUFFERS in src/include/access/multixact.h.

one question:
     we need to increase all SLRU buffers together , MULTIXACT, XACT, Subtrans, COMMIT TS ,  for example, got all of them doubled based on existing size ?   or only increase Subtrans , or Subtrans and multixact ? 

Thanks,

James

James Pang (chaolpan) <chaolpan@cisco.com> 於 2024年3月1日週五 下午2:45寫道:


-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Friday, February 2, 2024 4:13 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-performance@lists.postgresql.org
Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event

On 2024-Feb-02, James Pang (chaolpan) wrote:

>   Possible to increase  Subtrans SLRU buffer size ?

Not at present -- you need to recompile after changing NUM_SUBTRANS_BUFFERS in src/include/access/subtrans.h, NUM_MULTIXACTOFFSET_BUFFERS and NUM_MULTIXACTMEMBER_BUFFERS in src/include/access/multixact.h.

There's pending work to let these be configurable in version 17.

>   Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update
>   table xxxx;commit);   2) application JDBC client call procedure1
>   (it's a long running job, sometimes it could last > 1hours).
>   During this time window,  other Postgresql JDBC clients (100-200)
>   coming in in same time , then quickly see MultiXactoffset and
>   SubtransSLRU increased very quickly.
>   PL/PGSQL proc1--> procedure2(updates table) it use substransation in
>   procedure2 ,right?

If your functions/procedures use EXCEPTION clauses, that would create subtransactions also.

--
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

От
Alvaro Herrera
Дата:
On 2024-Mar-01, James Pang wrote:

> one question:
>      we need to increase all SLRU buffers together , MULTIXACT, XACT,
> Subtrans, COMMIT TS ,  for example, got all of them doubled based on
> existing size ?

No need.

> or only increase Subtrans , or Subtrans and multixact ?

Just increase the sizes for the ones that are causing you pain. You can
have a look at pg_stat_slru for some metrics that might be useful in
determining which are those.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

От
James Pang
Дата:
Hi Alvaro ,
 looks like  Xact slru buffer use a different way to control size, do we need to increase Xact and how to increase that ?   we plan to increase to 20 times size of existing buffers,  any side impact to 20 times increase these subtrans ? 
----------------+-------------+-------------+-----------+--------------+-------------+---------+-----------+-------------------------------
 CommitTs        |     1284048 |   387594150 |     54530 |      1305858 |           0 |       0 |         5 | 2024-01-19 05:01:38.900698+00
 MultiXactMember |       30252 | 23852620477 |  48555852 |        26106 |           0 |     127 |         0 | 2024-01-19 05:01:38.900698+00
 MultiXactOffset |       10638 | 23865848376 |  18434993 |         9375 |         127 |     127 |         5 | 2024-01-19 05:01:38.900698+00
 Notify          |           0 |           0 |         0 |            0 |           0 |       0 |         0 | 2024-01-19 05:01:38.900698+00
 Serial          |           0 |           0 |         0 |            0 |           0 |       0 |         0 | 2024-01-19 05:01:38.900698+00
 Subtrans        |      513486 | 12127027243 | 153119082 |       431238 |           0 |       0 |         0 | 2024-01-19 05:01:38.900698+00
 Xact            |       32107 | 22450403108 |  72043892 |        18064 |           0 |       0 |         3 | 2024-01-19 05:01:38.900698+00
 other           |           0 |           0 |         0 |            0 |           0 |       0 |         0 | 2024-01-19 05:01:38.900698+00
(8 rows)

Thanks,

James 

Alvaro Herrera <alvherre@alvh.no-ip.org> 於 2024年3月1日週五 下午3:35寫道:
On 2024-Mar-01, James Pang wrote:

> one question:
>      we need to increase all SLRU buffers together , MULTIXACT, XACT,
> Subtrans, COMMIT TS ,  for example, got all of them doubled based on
> existing size ?

No need.

> or only increase Subtrans , or Subtrans and multixact ?

Just increase the sizes for the ones that are causing you pain. You can
have a look at pg_stat_slru for some metrics that might be useful in
determining which are those.

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/