Обсуждение: Autovacuum launcher process launches worker process at high frequency

Поиск
Список
Период
Сортировка

Autovacuum launcher process launches worker process at high frequency

От
Masahiko Sawada
Дата:
Hi all,

I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.

Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.

Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.

After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.

Why does auto vacuum worker need to wake up launcher process after started?

autovacuum.c:L1604        /* wake up the launcher */       if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid,SIGUSR2);
 


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Autovacuum launcher process launches worker process at high frequency

От
Jeff Janes
Дата:
On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi all,

I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.

Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.

Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.

After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.

See also this thread, which was never resolved:



 
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.

Why does auto vacuum worker need to wake up launcher process after started?

autovacuum.c:L1604
         /* wake up the launcher */
        if (AutoVacuumShmem->av_launcherpid != 0)
            kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);


I think that that is so that the launcher can launch multiple workers in quick succession if it has fallen behind schedule. It can't launch them in a tight loop, because its signals to the postmaster would get merged into one signal, so it has to wait for one to get mostly set-up before launching the next.

But it doesn't make any real difference to your scenario, as the short-lived worker will wake the launcher up a few microseconds later anyway, when it realizes it has no work to do and so exits.

Cheers,

Jeff

Re: Autovacuum launcher process launches worker process at high frequency

От
Masahiko Sawada
Дата:
On Thu, Oct 6, 2016 at 12:11 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> Hi all,
>>
>> I found the kind of strange behaviour of the autovacuum launcher
>> process when XID anti-wraparound vacuum.
>>
>> Suppose that a database (say test_db) whose age of frozenxid is about
>> to reach max_autovacuum_max_age has three tables T1 and T2.
>> T1 is very large and is frequently updated, so vacuum takes long time
>> for vacuum.
>> T2 is static and already frozen table, thus vacuum can skip to vacuum
>> whole table.
>> And anti-wraparound vacuum was already executed on other databases.
>>
>> Once the age of datfrozenxid of test_db exceeded
>> max_autovacuum_max_age, autovacuum launcher launches worker process in
>> order to do anti-wraparound vacuum on testdb.
>> A worker process assigned to test_db begins to vacuum T1, it takes long
>> time.
>> Meanwhile another worker process is assigned to test_db and completes
>> to vacuum on T2 and exits.
>>
>> After for while, the autovacuum launcher launches new worker again and
>> worker is assigned to test_db again.
>> But that worker exits quickly because there is no table we need to
>> vacuum. (T1 is being vacuumed by another worker process).
>> When new worker process starts, worker process sends SIGUSR2 signal to
>> launcher process to wake up him.
>> Although the launcher process executes WaitLatch() after launched new
>> worker, it is woken up and launches another new worker process soon
>> again.
>
>
> See also this thread, which was never resolved:
>
>
https://www.postgresql.org/message-id/flat/CAMkU%3D1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta%3DYPyFPQ%40mail.gmail.com#CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
>
>
>
>>
>> As a result, launcher process launches new worker process at extremely
>> high frequency regardless of autovacuum_naptime, which increase cpu
>> use rate.
>>
>> Why does auto vacuum worker need to wake up launcher process after
>> started?
>>
>> autovacuum.c:L1604
>>          /* wake up the launcher */
>>         if (AutoVacuumShmem->av_launcherpid != 0)
>>             kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
>
>
>
> I think that that is so that the launcher can launch multiple workers in
> quick succession if it has fallen behind schedule. It can't launch them in a
> tight loop, because its signals to the postmaster would get merged into one
> signal, so it has to wait for one to get mostly set-up before launching the
> next.
>
> But it doesn't make any real difference to your scenario, as the short-lived
> worker will wake the launcher up a few microseconds later anyway, when it
> realizes it has no work to do and so exits.
>

Thank you for the reply.

I also thought that it's better to have information about how many
tables there are in each database and not been vacuumed yet.
But I'm not sure how to implement that and  the current optimistic
logic is more safe in most situation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center