Обсуждение: Multi-transactional statements and statistics for autovacuum

Поиск
Список
Период
Сортировка

Multi-transactional statements and statistics for autovacuum

От
"Igor V.Gnatyuk"
Дата:
Hello.

Before the advent of procedures in PostgreSQL 11 that can manage 
transactions, there could only be one transaction
in one statement. Hence the end of the transaction also meant the end of 
the statement. Apparently, this is why
the corresponding restriction is described differently in different 
places of the documentation:

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS
"...so a query or transaction still in progress does not affect the 
displayed totals..."
"...counts actions taken so far within the current transaction..."

But now it's possible that several transactions are performed within one 
SQL statement call.
At the same time, the current implementation transfers the accumulated 
statistics to the shared memory only
at the end of the statement. These statistics data is used by automatic 
vacuum. Thus, in a situation
where some procedure that changes data is running for a long time (e.g. 
an infinite event processing loop,
including implementing any queues), the changes made and committed in it 
will not affect statistics in shared memory
until the CALL statement is finished. This will not allow the autovacuum 
to make the right cleaning decision in time.
To illustrate the described feature, I suggest to consider the example 
below.

Example.

We process the data in the 'test' table. The 'changes' column will show 
the number of row updates:

   CREATE TABLE test (changes int);

Let's insert a row into the table:

   INSERT INTO test VALUES (0);

At each processing step, the value of the 'changes' column will be 
incremented. The processing will be performed
in a long-running loop within the 'process' procedure (see below). The 
actions of each loop step are committed.

   CREATE PROCEDURE process() AS $$
   DECLARE
     l_chs int;
   BEGIN
     LOOP
       UPDATE test SET changes = changes + 1 RETURNING changes INTO 
l_chs;
       COMMIT;
       RAISE NOTICE 'changes % -- upd_shared = %, upd_local = %', l_chs,
                    (SELECT n_tup_upd FROM pg_stat_all_tables
                     WHERE relname = 'test'),  -- statistics in shared 
memory (considered by autovacuum)
                    (SELECT n_tup_upd FROM pg_stat_xact_all_tables
                     WHERE relname = 'test');   -- statistics within the 
operation (transaction)
     END LOOP;
   END
   $$ LANGUAGE plpgsql

Let's call the procedure:

   CALL process();

NOTICE:  changes 1 -- upd_shared = 0, upd_local = 1
NOTICE:  changes 2 -- upd_shared = 0, upd_local = 2
NOTICE:  changes 3 -- upd_shared = 0, upd_local = 3
NOTICE:  changes 4 -- upd_shared = 0, upd_local = 4
NOTICE:  changes 5 -- upd_shared = 0, upd_local = 5
NOTICE:  changes 6 -- upd_shared = 0, upd_local = 6
NOTICE:  changes 7 -- upd_shared = 0, upd_local = 7
NOTICE:  changes 8 -- upd_shared = 0, upd_local = 8
...

If we now observe the cumulative statistics on the 'test' table from 
another session, we will see
that despite the fact that there are updates and dead tuples appear, 
this information does not get into the shared memory:

SELECT n_tup_upd, n_dead_tup, n_ins_since_vacuum, vacuum_count, 
autovacuum_count FROM pg_stat_all_tables WHERE relname = 'test'
     |  n_tup_upd          | 0
     |  n_dead_tup         | 0
     |  n_ins_since_vacuum | 1
     |  vacuum_count       | 0
     |  autovacuum_count   | 0

It would be logical to remove the existing restriction, that is, to 
update statistics data precisely
after transaction completion, even if the operator is still working.

-- 
Regards, Igor Gnatyuk
Postgres Professional https://postgrespro.com



Re: Multi-transactional statements and statistics for autovacuum

От
Egor Rogov
Дата:
Hello everybody,

On 12.06.2024 20:13, Igor V.Gnatyuk wrote:
> Hello.
>
> Before the advent of procedures in PostgreSQL 11 that can manage 
> transactions, there could only be one transaction
> in one statement. Hence the end of the transaction also meant the end 
> of the statement. Apparently, this is why
> the corresponding restriction is described differently in different 
> places of the documentation:
>
> https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS 
>
> "...so a query or transaction still in progress does not affect the 
> displayed totals..."
> "...counts actions taken so far within the current transaction..."
>
> But now it's possible that several transactions are performed within 
> one SQL statement call.
> At the same time, the current implementation transfers the accumulated 
> statistics to the shared memory only
> at the end of the statement. These statistics data is used by 
> automatic vacuum. Thus, in a situation
> where some procedure that changes data is running for a long time 
> (e.g. an infinite event processing loop,
> including implementing any queues), the changes made and committed in 
> it will not affect statistics in shared memory
> until the CALL statement is finished. This will not allow the 
> autovacuum to make the right cleaning decision in time.
> To illustrate the described feature, I suggest to consider the example 
> below.


It would be nice to know if this is considered desired behavior or an 
oversight.

If it's OK that transaction(s) statistics are not accumulated in shared 
memory until the end of the SQL statement, we should at least improve 
documentation to better reflect this.

Although, from my POV, statistics should be send to shared memory after 
the end of each transaction, regardless of the boundaries of SQL 
statements. With the current implementation, it's not possible to build 
an infinite-loop query processing routine entirely in Postgres; we have 
to rely on external tools either to implement a processing loop (to 
issue separate SQL statements for each event) or to schedule vacuum.

What's your opinion on this?


>
> Example.
>
> We process the data in the 'test' table. The 'changes' column will 
> show the number of row updates:
>
>   CREATE TABLE test (changes int);
>
> Let's insert a row into the table:
>
>   INSERT INTO test VALUES (0);
>
> At each processing step, the value of the 'changes' column will be 
> incremented. The processing will be performed
> in a long-running loop within the 'process' procedure (see below). The 
> actions of each loop step are committed.
>
>   CREATE PROCEDURE process() AS $$
>   DECLARE
>     l_chs int;
>   BEGIN
>     LOOP
>       UPDATE test SET changes = changes + 1 RETURNING changes INTO l_chs;
>       COMMIT;
>       RAISE NOTICE 'changes % -- upd_shared = %, upd_local = %', l_chs,
>                    (SELECT n_tup_upd FROM pg_stat_all_tables
>                     WHERE relname = 'test'),  -- statistics in shared 
> memory (considered by autovacuum)
>                    (SELECT n_tup_upd FROM pg_stat_xact_all_tables
>                     WHERE relname = 'test');   -- statistics within 
> the operation (transaction)
>     END LOOP;
>   END
>   $$ LANGUAGE plpgsql
>
> Let's call the procedure:
>
>   CALL process();
>
> NOTICE:  changes 1 -- upd_shared = 0, upd_local = 1
> NOTICE:  changes 2 -- upd_shared = 0, upd_local = 2
> NOTICE:  changes 3 -- upd_shared = 0, upd_local = 3
> NOTICE:  changes 4 -- upd_shared = 0, upd_local = 4
> NOTICE:  changes 5 -- upd_shared = 0, upd_local = 5
> NOTICE:  changes 6 -- upd_shared = 0, upd_local = 6
> NOTICE:  changes 7 -- upd_shared = 0, upd_local = 7
> NOTICE:  changes 8 -- upd_shared = 0, upd_local = 8
> ...
>
> If we now observe the cumulative statistics on the 'test' table from 
> another session, we will see
> that despite the fact that there are updates and dead tuples appear, 
> this information does not get into the shared memory:
>
> SELECT n_tup_upd, n_dead_tup, n_ins_since_vacuum, vacuum_count, 
> autovacuum_count FROM pg_stat_all_tables WHERE relname = 'test'
>     |  n_tup_upd          | 0
>     |  n_dead_tup         | 0
>     |  n_ins_since_vacuum | 1
>     |  vacuum_count       | 0
>     |  autovacuum_count   | 0
>
> It would be logical to remove the existing restriction, that is, to 
> update statistics data precisely
> after transaction completion, even if the operator is still working.
>