Re: autovacuum next steps, take 3

Поиск
Список
Период
Сортировка
От Matthew T. O'Connor
Тема Re: autovacuum next steps, take 3
Дата
Msg-id 45F1F316.7020905@zeut.net
обсуждение исходный текст
Ответ на autovacuum next steps, take 3  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: autovacuum next steps, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
My initial reaction is that this looks good to me, but still a few 
comments below.

Alvaro Herrera wrote:
> Here is a low-level, very detailed description of the implementation of
> the autovacuum ideas we have so far.
> 
> launcher's dealing with databases
> ---------------------------------

[ Snip ]

> launcher and worker interactions

[Snip]

> worker to-do list
> -----------------
> When each worker starts, it determines which tables to process in the
> usual fashion: get pg_autovacuum and pgstat data and compute the
> equations.
> 
> The worker then takes a "snapshot" of what's currently going on in the
> database, by storing worker PIDs, the corresponding table OID that's
> being currently worked, and the to-do list for each worker.

Does a new worker really care about the PID of other workers or what 
table they are currently working on?

> It removes from its to-do list the tables being processed.  Finally, it
> writes the list to disk.

Just to be clear, the new worker removes from it's todo list all the 
tables mentioned in the todo lists of all the other workers?

> The table list will be written to a file in
> PGDATA/vacuum/<database-oid>/todo.<worker-pid>
> The file will consist of table OIDs, in the order in which they are
> going to be vacuumed.
> 
> At this point, vacuuming can begin.

This all sounds good to me so far.

> Before processing each table, it scans the WorkerInfos to see if there's
> a new worker, in which case it reads its to-do list to memory.

It's not clear to me why a worker cares that there is a new worker, 
since the new worker is going to ignore all the tables that are already 
claimed by all worker todo lists.

> Then it again fetches the tables being processed by other workers in the
> same database, and for each other worker, removes from its own in-memory
> to-do all those tables mentioned in the other lists that appear earlier
> than the current table being processed (inclusive).  Then it picks the
> next non-removed table in the list.  All of this must be done with the
> Autovacuum LWLock grabbed in exclusive mode, so that no other worker can
> pick the same table (no IO takes places here, because the whole lists
> were saved in memory at the start.)

Again it's not clear to me what this is gaining us?  It seems to me that 
if when a worker starts up writes out it's to-do list, it should just do 
it, I don't see the value in workers constantly updating their todo 
lists.  Maybe I'm just missing something can you enlighten me?

> other things to consider
> ------------------------
> 
> This proposal doesn't deal with the hot tables stuff at all, but that is
> very easy to bolt on later: just change the first phase, where the
> initial to-do list is determined, to exclude "cold" tables.  That way,
> the vacuuming will be fast.  Determining what is a cold table is still
> an exercise to the reader ...

I think we can make this algorithm naturally favor small / hot tables 
with one small change.  Having workers remove tables that they just 
vacuumed from their to-do lists and re-write their todo lists to disk. 
Assuming the todo lists are ordered by size ascending, smaller tables 
will be made available for inspection by newer workers sooner rather 
than later.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Log levels for checkpoint/bgwriter monitoring
Следующее
От: Dave Page
Дата:
Сообщение: Re: who gets paid for this