Обсуждение: vacuum freeze wait_event BufferPin

Поиск
Список
Период
Сортировка

vacuum freeze wait_event BufferPin

От
abrahim abrahao
Дата:
There is a three-day vacuum freeze on a partition table with wait_event = BufferPin, no transactions (active or idle) older than it that are not blocked, but at least one select query is executing at all times related of this partition table. 

This behaviour happens with some partitions, but not always. Running simply vacuum works because it does not wait for BufferPin.
I just expect to see vacuum freeze wait_event = BufferPin if there is any old transaction related to the table, am I wrong?
Is there a wait to figure out which session vacuum freeze to wait for?

Below is a list of all sessions at a particular moment in time; at this point, there are only Select queries with query starts older than the vacuum freeze, and no cursor, transaction block, or zombie sessions.


SELECT pg.query_start,  now() -  query_start AS duration, pg.pid pg_pid, pg.state,  pg.wait_event, backend_type  ,
pg.wait_event_type, pg_blocking_pids(pid) as blocked_by,SPLIT_PART(trim(pg.query),' ', 1)
FROM pg_stat_activity pg order by now() -  query_start desc;
          query_start                            |        duration              | pg_pid | state  |      wait_event               |         backend_type               | wait_event_type | blocked_by | split_part
----------------------------------------------+----------------------------+-----------+--------+------------------------------+--------------------------------------+-----------------------+------------+------------
                                                        |                                   |    249    |           | AutoVacuumMain        | autovacuum launcher           | Activity                | {}         |
                                                        |                                   |    252    |           | LogicalLauncherMain  | logical replication launcher   | Activity                | {}         |
                                                        |                                   |    247    |           | BgWriterMain               | background writer                | Activity                | {}         |
                                                        |                                   |    250    |           | ArchiverMain                | archiver                                | Activity                | {}         |
                                                        |                                   |    246    |           | CheckpointWriteDelay | checkpointer                         | Timeout              | {}         |
                                                        |                                   |    248    |           | WalWriterMain             | walwriter                               | Activity                | {}         |
 2024-01-27 23:10:54.159577+00 | 3 days 18:55:48.         | 271039 | active| BufferPin                      | client backend                      | BufferPin             | {}         | vacuum
 2024-01-30 21:33:50.262265+00 | 20:32:52.490309        | 558810 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 16:53:17.695929+00 | 01:13:25.056645        | 641053 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:04:48.929097+00 | 00:01:53.823477        | 645812 | active |                                     | client backend                      |                            | {}         | SELECT
 2024-01-31 18:05:02.221272+00 | 00:01:40.531302        | 645814 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:05:04.927131+00 | 00:01:37.825443        | 645925 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:05:12.088875+00 | 00:01:30.663699        | 645109 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:05:30.202305+00 | 00:01:12.550269        | 645695 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:05:47.872601+00 | 00:00:54.879973        | 645924 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:06:16.425286+00 | 00:00:26.327288        | 645593 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:06:18.286905+00 | 00:00:24.465669        | 646174 | idle     | ClientRead                  | client backend                      | Client                  | {}         | select    
 2024-01-31 18:06:24.878488+00 | 00:00:17.874086        | 644993 | active |                                     | client backend                      |                            | {}         | SELECT
 2024-01-31 18:06:32.560233+00 | 00:00:10.192341        | 645590 | active |                                     | client backend                      |                            | {}         | SELECT
 2024-01-31 18:06:34.068919+00 | 00:00:08.683655        | 646175 | idle     | ClientRead                  | client backend                      | Client                  | {}         | select
 2024-01-31 18:06:36.159063+00 | 00:00:06.593511        | 645926 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:06:40.002643+00 | 00:00:02.749931        |    351     | idle    | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.5886+00   | 00:00:00.163974          | 645212 | active |                                     | client backend                      |                            | {}         | WITH
 2024-01-31 18:06:42.717906+00 | 00:00:00.034668        | 645207 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.721052+00 | 00:00:00.031522        | 645817 | idle     |                                     | client backend                      |                            | {}         | SELECT
 2024-01-31 18:06:42.723673+00 | 00:00:00.028901        | 644916 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.725967+00 | 00:00:00.026607        | 645815 | idle     | ClientRead                  | client backend                      | Client                  | {}         | (SELECT
 2024-01-31 18:06:42.732666+00 | 00:00:00.019908        | 644752 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.733953+00 | 00:00:00.018621        | 645697 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.735012+00 | 00:00:00.017562        | 645193 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.735088+00 | 00:00:00.017486        | 645927 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.736222+00 | 00:00:00.016352        | 645221 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.741013+00 | 00:00:00.011561        | 645813 | active |                                     | client backend                      |                             | {}         | SELECT
 2024-01-31 18:06:42.745033+00 | 00:00:00.007541        | 645238 | idle     | ClientRead                  | client backend                      | Client                  | {}         | (SELECT
 2024-01-31 18:06:42.747161+00 | 00:00:00.005413        | 645596 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.747703+00 | 00:00:00.004871        | 645232 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.749182+00 | 00:00:00.003392        | 645820 | active |                                     | client backend                      |                            | {}         | SELECT
 2024-01-31 18:06:42.750896+00 | 00:00:00.001678        | 645594 | idle     | ClientRead                  | client backend                      | Client                    | {}         | SELECT
 2024-01-31 18:06:42.752015+00 | 00:00:00.000559        | 645620 | idle     | ClientRead                  | client backend                      | Client                  | {}         | SELECT
 2024-01-31 18:06:42.752574+00 | 00:00:00                     | 642199 | active |                                     | client backend                      |                            | {}         | SELECT
(40 rows)

SELECT * FROM pgstattuple('pg_catalog.pg_proc');  
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
    851968 |        3450 |    799839 |         93.88 |               26 |          11602 |               1.36 |      10960 |         1.29


SELECT * FROM pgstattuple('[table_name]');

 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
 922165248 |    11679432 | 805880808 |         87.39 |                0 |              0 |                  0 |   31376484 |          3.4


SELECT * FROM pgstatindex('[index_name_01]');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          2 |  262348800 |           412 |            113 |      31911 |           0 |             0 |            90.09 |                  0

 SELECT * FROM pgstatindex('[index_name_02]');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          2 |  381714432 |           351 |            226 |      46369 |           0 |             0 |             86.9 |               5.67





Re: vacuum freeze wait_event BufferPin

От
Greg Sabino Mullane
Дата:
On Wed, Jan 31, 2024 at 2:05 PM abrahim abrahao <a_abrahao@yahoo.com.br> wrote:
There is a three-day vacuum freeze on a partition table with wait_event = BufferPin, no transactions (active or idle) older than it that are not blocked, but at least one select query is executing at all times related of this partition table. 
... 
Is there a wait to figure out which session vacuum freeze to wait for?

The vacuum needs a chance to get in and make changes to the table, but it's not being blocked at the traditional lock level that shows up in pg_blocking_pids. You can see what is going on with this:

select pid, mode, query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' order by 3;

That may show your vacuum process with a ShareUpdateExclusiveLock and some other processes with other locks, probably AccessShareLock. Those other pids need to all finish or be killed - and not have any overlap between them. In other words, that vacuum process needs to have exclusive access to the table for a split second, no matter if the other process locked the table before or after the vacuum started. One crude solution would be to cancel any other existing backends interested in that table:

select pg_cancel_backend(pid), now()-query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' and lower(query) !~ 'vacuum';

Not a good long-term solution, but depending on how often the table is updated, you might have other options. Perhaps disable  autovacuum for this table and do a manual vacuum (e.g. in a cron script) that kills the other backends as per above, or runs during a time with not-constant reads on the table. Or have something that is able to pause the application. Or if this is a partitioned table that might get dropped in the future or at least not queried heavily, do not worry about vacuuming it now.

Cheers,
Greg

Re: vacuum freeze wait_event BufferPin

От
abrahim abrahao
Дата:
Thanks Greg,  I really appreciated you message.
I executed the query you shared, and it is showing exactly the same type of lock you talked, it help me a lot. Thanks
It is a usual behavior in some busy databases, I am trying to avoid cancel sessions.
 
I would like also double check my understanding about locks on this documentation ( https://www.postgresql.org/docs/14/explicit-locking.html)
Based on my understanding  on table 13.2  SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock) should not be blocked by ACCESS SHARE (AccessShareLock). Am I wrong about it? If I am not wrong why it still locking it? 

Note: Information below come from the link above.
Table-Level Lock Modes
  • SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock)
    • Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema changes and VACUUM runs.
    • Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, CREATE STATISTICS, COMMENT ON, REINDEX CONCURRENTLY, and certain ALTER INDEX and ALTER TABLE variants (for full details see the documentation of these commands).
  • ACCESS SHARE (AccessShareLock)
    • Conflicts with the ACCESS EXCLUSIVE lock mode only.
    • The SELECT command acquires a lock of this mode on referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode.
Inline image

Note: This information below come from another database,  the first lock was done.
select pid, mode, query_start, SPLIT_PART(trim(query),' ', 1)  from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = ' mytable' order by 3;
  pid  |           mode           |          query_start          | split_part
-------+--------------------------+-------------------------------+-------------
   376 | AccessShareLock          | 2024-02-02 08:11:08.938949+00 | SELECT
   508 | ShareUpdateExclusiveLock | 2024-02-02 08:11:17.822287+00 | vacuum
 52767 | AccessShareLock          | 2024-02-02 19:43:40.110489+00 | SELECT
 53137 | AccessShareLock          | 2024-02-02 19:44:19.331633+00 | SELECT
 53460 | AccessShareLock          | 2024-02-02 19:54:00.315714+00 | SELECT
 54203 | AccessShareLock          | 2024-02-02 19:54:39.449686+00 | SELECT
 53164 | AccessShareLock          | 2024-02-02 20:01:26.429547+00 | SELECT
 54002 | AccessShareLock          | 2024-02-02 20:01:32.749586+00 | SELECT
 53583 | AccessShareLock          | 2024-02-02 20:01:34.624046+00 | SELECT


Inline image



On Friday, February 2, 2024 at 01:37:19 p.m. EST, Greg Sabino Mullane <htamfids@gmail.com> wrote:


On Wed, Jan 31, 2024 at 2:05 PM abrahim abrahao <a_abrahao@yahoo.com.br> wrote:
There is a three-day vacuum freeze on a partition table with wait_event = BufferPin, no transactions (active or idle) older than it that are not blocked, but at least one select query is executing at all times related of this partition table. 
... 
Is there a wait to figure out which session vacuum freeze to wait for?

The vacuum needs a chance to get in and make changes to the table, but it's not being blocked at the traditional lock level that shows up in pg_blocking_pids. You can see what is going on with this:

select pid, mode, query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' order by 3;

That may show your vacuum process with a ShareUpdateExclusiveLock and some other processes with other locks, probably AccessShareLock. Those other pids need to all finish or be killed - and not have any overlap between them. In other words, that vacuum process needs to have exclusive access to the table for a split second, no matter if the other process locked the table before or after the vacuum started. One crude solution would be to cancel any other existing backends interested in that table:

select pg_cancel_backend(pid), now()-query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' and lower(query) !~ 'vacuum';

Not a good long-term solution, but depending on how often the table is updated, you might have other options. Perhaps disable  autovacuum for this table and do a manual vacuum (e.g. in a cron script) that kills the other backends as per above, or runs during a time with not-constant reads on the table. Or have something that is able to pause the application. Or if this is a partitioned table that might get dropped in the future or at least not queried heavily, do not worry about vacuuming it now.

Cheers,
Greg

Вложения

Re: vacuum freeze wait_event BufferPin

От
Greg Sabino Mullane
Дата:
On Fri, Feb 2, 2024 at 3:25 PM abrahim abrahao <a_abrahao@yahoo.com.br> wrote:
 SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock) should not be blocked by ACCESS SHARE (AccessShareLock). Am I wrong about it? If I am not wrong why it still locking it? 

Those locks with no other context are somewhat  of a red herring. The important part is not that the AccessShare is somehow blocking ShareUpdateExclusive, but that the ShareUpdateExclusive process is NOT blocking new AccessShare processes! In the internals of postgres, vacuumlazy.c tries to grab a buffer lock (different concept from all the user-visible locks above). It politely tries to wait[1] until nobody else is grabbing it (aka pinning it), then proceeds. The problem is that other processes are allowed to come along and put a pin in it as well - the vacuum's shareupdateexclusive lock does not prevent that.

So the timeline is:

Process X runs a long select and pins the buffer
Process V runs a vacuum freeze and tries to lock the buffer. It detects other pins, so it waits. It assumes that whoever is holding the pin will release it someday.
Process Y runs another long select and also pins the buffer.
Process X ends, and removes its pins.
Process V still cannot move - it just knows there are still pins. Where they come from does not matter.

As long as there is at least one other process holding a pin, the vacuum freeze cannot continue[2].

That's my understanding of the code, anyway. This could be argued as a bug. I am not sure what a solution would be. Cancelling user queries just for a vacuum would not be cool, but we could maybe pause future pin-creating actions somehow?

For the time being, forcing a super-quick moment of no table access would seem to be your best bet, as described earlier.

Cheers,
Greg

See backend/storage/buffer/bufmgr.c for LockBufferForCleanup()

[2] Quick duplication script:
drop table if exists foobar;
create table foobar as select 1 as id;
alter table foobar set (autovacuum_enabled = off);
update foobar set id = id;

Process 1:
begin; select *, pg_sleep(111111) from foobar;

Process 2:
vacuum(freeze,verbose) foobar; /* blocked */

Process 3:
begin; select *, pg_sleep(333333) from foobar;

Run in order. Kill Process 1 and Process 2 is still blocked. Kill Process 3 and Process 2 finished the vacuum.
Note that a regular vacuum (without a freeze) will not get blocked.

Cheers,
Greg

Re: vacuum freeze wait_event BufferPin

От
abrahim abrahao
Дата:

Thanks again Greg, I really appreciated all information.



On Friday, February 2, 2024 at 08:16:41 p.m. EST, Greg Sabino Mullane <htamfids@gmail.com> wrote:


On Fri, Feb 2, 2024 at 3:25 PM abrahim abrahao <a_abrahao@yahoo.com.br> wrote:
 SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock) should not be blocked by ACCESS SHARE (AccessShareLock). Am I wrong about it? If I am not wrong why it still locking it? 

Those locks with no other context are somewhat  of a red herring. The important part is not that the AccessShare is somehow blocking ShareUpdateExclusive, but that the ShareUpdateExclusive process is NOT blocking new AccessShare processes! In the internals of postgres, vacuumlazy.c tries to grab a buffer lock (different concept from all the user-visible locks above). It politely tries to wait[1] until nobody else is grabbing it (aka pinning it), then proceeds. The problem is that other processes are allowed to come along and put a pin in it as well - the vacuum's shareupdateexclusive lock does not prevent that.

So the timeline is:

Process X runs a long select and pins the buffer
Process V runs a vacuum freeze and tries to lock the buffer. It detects other pins, so it waits. It assumes that whoever is holding the pin will release it someday.
Process Y runs another long select and also pins the buffer.
Process X ends, and removes its pins.
Process V still cannot move - it just knows there are still pins. Where they come from does not matter.

As long as there is at least one other process holding a pin, the vacuum freeze cannot continue[2].

That's my understanding of the code, anyway. This could be argued as a bug. I am not sure what a solution would be. Cancelling user queries just for a vacuum would not be cool, but we could maybe pause future pin-creating actions somehow?

For the time being, forcing a super-quick moment of no table access would seem to be your best bet, as described earlier.

Cheers,
Greg

See backend/storage/buffer/bufmgr.c for LockBufferForCleanup()

[2] Quick duplication script:
drop table if exists foobar;
create table foobar as select 1 as id;
alter table foobar set (autovacuum_enabled = off);
update foobar set id = id;

Process 1:
begin; select *, pg_sleep(111111) from foobar;

Process 2:
vacuum(freeze,verbose) foobar; /* blocked */

Process 3:
begin; select *, pg_sleep(333333) from foobar;

Run in order. Kill Process 1 and Process 2 is still blocked. Kill Process 3 and Process 2 finished the vacuum.
Note that a regular vacuum (without a freeze) will not get blocked.

Cheers,
Greg