Re: Track the amount of time waiting due to cost_delay

Поиск
Список
Период
Сортировка
От Imseih (AWS), Sami
Тема Re: Track the amount of time waiting due to cost_delay
Дата
Msg-id 42B4800A-D8D0-4734-930A-4D94D0EA90C7@amazon.com
обсуждение исходный текст
Ответ на Re: Track the amount of time waiting due to cost_delay  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Track the amount of time waiting due to cost_delay
Re: Track the amount of time waiting due to cost_delay
Список pgsql-hackers
>> 2. the leader being interrupted while waiting is also already happening on master
>> due to the pgstat_progress_parallel_incr_param() calls in
>> parallel_vacuum_process_one_index() (that have been added in
>> 46ebdfe164). It has been the case "only" 36 times during my test case.

46ebdfe164 will interrupt the leaders sleep every time a parallel workers reports
progress, and we currently don't handle interrupts by restarting the sleep with
the remaining time. nanosleep does provide the ability to restart with the remaining
time [1], but I don't think it's worth the effort to ensure more accurate
vacuum delays for the leader process. 


> 1. Having a time based only approach to throttle 

I do agree with a time based approach overall.


> 1.1) the more parallel workers is used, the less the impact of the leader on
> the vacuum index phase duration/workload is (because the repartition is done
> on more processes).

Did you mean " because the vacuum is done on more processes"? 

When a leader is operating on a large index(s) during the entirety
of the vacuum operation, wouldn't more parallel workers end up
interrupting the leader more often? This is why I think reporting even more
often than 1 second (more below) will be better.

> 3. A 1 second reporting "throttling" looks a reasonable threshold as:

> 3.1 the idea is to have a significant impact when the leader could have been
> interrupted say hundred/thousand times per second.

> 3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum
> multiple times per second (so a one second reporting granularity seems ok).

I feel 1 second may still be too frequent. 
What about 10 seconds ( or 30 seconds )? 
I think this metric in particular will be mainly useful for vacuum runs that are 
running for minutes or more, making reporting every 10 or 30 seconds 
still useful.

It just occurred to me also that pgstat_progress_parallel_incr_param 
should have a code comment that it will interrupt a leader process and
cause activity such as a sleep to end early.



Regards,

Sami Imseih
Amazon Web Services (AWS)


[1] https://man7.org/linux/man-pages/man2/nanosleep.2.html



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY