Re: autovacuum next steps, take 3
От | Matthew T. O'Connor |
---|---|
Тема | Re: autovacuum next steps, take 3 |
Дата | |
Msg-id | 45F1F316.7020905@zeut.net обсуждение исходный текст |
Ответ на | autovacuum next steps, take 3 (Alvaro Herrera <alvherre@commandprompt.com>) |
Ответы |
Re: autovacuum next steps, take 3
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
My initial reaction is that this looks good to me, but still a few comments below. Alvaro Herrera wrote: > Here is a low-level, very detailed description of the implementation of > the autovacuum ideas we have so far. > > launcher's dealing with databases > --------------------------------- [ Snip ] > launcher and worker interactions [Snip] > worker to-do list > ----------------- > When each worker starts, it determines which tables to process in the > usual fashion: get pg_autovacuum and pgstat data and compute the > equations. > > The worker then takes a "snapshot" of what's currently going on in the > database, by storing worker PIDs, the corresponding table OID that's > being currently worked, and the to-do list for each worker. Does a new worker really care about the PID of other workers or what table they are currently working on? > It removes from its to-do list the tables being processed. Finally, it > writes the list to disk. Just to be clear, the new worker removes from it's todo list all the tables mentioned in the todo lists of all the other workers? > The table list will be written to a file in > PGDATA/vacuum/<database-oid>/todo.<worker-pid> > The file will consist of table OIDs, in the order in which they are > going to be vacuumed. > > At this point, vacuuming can begin. This all sounds good to me so far. > Before processing each table, it scans the WorkerInfos to see if there's > a new worker, in which case it reads its to-do list to memory. It's not clear to me why a worker cares that there is a new worker, since the new worker is going to ignore all the tables that are already claimed by all worker todo lists. > Then it again fetches the tables being processed by other workers in the > same database, and for each other worker, removes from its own in-memory > to-do all those tables mentioned in the other lists that appear earlier > than the current table being processed (inclusive). Then it picks the > next non-removed table in the list. All of this must be done with the > Autovacuum LWLock grabbed in exclusive mode, so that no other worker can > pick the same table (no IO takes places here, because the whole lists > were saved in memory at the start.) Again it's not clear to me what this is gaining us? It seems to me that if when a worker starts up writes out it's to-do list, it should just do it, I don't see the value in workers constantly updating their todo lists. Maybe I'm just missing something can you enlighten me? > other things to consider > ------------------------ > > This proposal doesn't deal with the hot tables stuff at all, but that is > very easy to bolt on later: just change the first phase, where the > initial to-do list is determined, to exclude "cold" tables. That way, > the vacuuming will be fast. Determining what is a cold table is still > an exercise to the reader ... I think we can make this algorithm naturally favor small / hot tables with one small change. Having workers remove tables that they just vacuumed from their to-do lists and re-write their todo lists to disk. Assuming the todo lists are ordered by size ascending, smaller tables will be made available for inspection by newer workers sooner rather than later.
В списке pgsql-hackers по дате отправления: