Re: Autovacuum improvements

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Autovacuum improvements
Дата
Msg-id 20070114202719.GC7233@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: Autovacuum improvements  ("Matthew T. O'Connor" <matthew@zeut.net>)
Ответы Re: Autovacuum improvements  ("Matthew T. O'Connor" <matthew@zeut.net>)
Список pgsql-hackers
Matthew T. O'Connor wrote:

> Alvaro Herrera wrote:
> >I've been thinking how to improve autovacuum so that we can convince
> >more people that it can be enabled by default.  
> 
> I would like to see it enabled by default too, however the reason it 
> isn't already enabled by default is that it caused failures in the 
> regression test when we tried to turn it on during the 8.2 dev cycle and 
> it was too close to beta to fix everything.  All this new machinery is 
> great, but it doesn't address that problem.

See my reply to Tom on this topic.


> >pg_av_igroupmembers
> > groupid    oid
> > month        int
> > dom        int
> > dow        int
> > starttime    timetz
> > endtime    timetz
> 
> This seems to assume that the start and end time for an interval will be 
> on the same day, you probably need to specify a start month, dom, dow, 
> time and an end month, dom, dow and time.

Actually, I was thinking that if you want intervals that cross day
boundaries, you just add more tuples (one which finishes at 23:59:59 and
another which starts at 00:00:00 the next day).

> Since this is modeled after cron, do we allow wild-cards, or any of the 
> other cron tricks like */20 or 1-3,5,7,9-11?

Wildcards yes (using NULL), but not the rest because it would make the
autovacuum code responsible for parsing the values which I don't think
is a good idea.  And it's not normalized anyway.

> Also your notation above is ambiguous, it took me a while to realize 
> that pg_av_igroupmembers.groupid wasn't referencing the id from 
> pg_av_tablegroup.

Hmm, yeah, that one is referencing pg_av_intervalgroup.


> >pg_av_schedule
> > tgroup                oid
> > igroup                oid
> > enabled            bool
> > queue                int
> > vac_base_thresh    int
> > vac_scale_factor    float
> > anl_base_thresh    int
> > anl_scal_factor    float
> > vac_cost_delay        int
> > vac_cost_limit        int
> > freeze_min_age        int
> > freeze_max_age        int
> >
> 
> What is queue for?

Sorry, that was part of the queue stuff which I then deleted :-)


> >So the scheduler, at startup, loads the whole schedule in memory, and
> >then wakes up at reasonable intervals and checks whether these equations
> >hold for some of the tables it's monitoring.  If they do, then launch a
> >new worker process to do the job.
> >
> >We need a mechanism for having the scheduler rescan the schedule when a
> >user modifies the catalog -- maybe having a trigger that sends a signal
> >to the process is good enough (implementation detail: the signal must be
> >routed via the postmaster, since the backend cannot hope to know the
> >scheduler's PID.  This is easy enough to do.)
> 
> This all looks reasonable if not a bit complex.  Question, what happens 
> to the current pg_autovacuum relation?

I had two ideas: one was to make pg_autovacuum hold default config for
all tables not mentioned in any group, so sites which are OK with 8.2's
representation can still use it.  The other idea was to remove it and
replace it with this mechanism.

> Also what about system defaults, will we have a hard coded default 
> interval of always on, and one default table group that contains all the 
> tables with one default entry in pg_av_schedule?

Yes, that's what I had in mind.

> I think we need more discussion on scheduling, we need to make sure this 
> solves the vacuum starvation problem.  Does the launcher process 
> consider each row in pg_av_schedule that applies at the current time 
> separately?  That is say there are three entries in pg_av_schedule that 
> apply right now, does that mean that the launcher can fire off three 
> different vacuums?  Perhaps we need to add a column to pg_av_tablegroup 
> that specifies the max number of concurrent worker processes for this 
> table group.

My idea was to assign each table, or maybe each group, to a queue, and
then have as much workers as there are queues.  So you could put them
all in a single queue and it would mean there can be at most one vacuum
running at any time.  Or you could put each group in a queue, and then
there could be as many workers as there are groups.  Or you could mix.

And also there would be a "autovac concurrency limit", which would be
a GUC var saying how many vacuums to have at any time.

> Also, I don't think we need the concept of queues as described in recent 
> threads.  I think the idea of the queues was the the system would be 
> able to automatically find small tables and vacuum them frequently, in 
> this proposal the admin would have to create a group for small tables 
> and manually add tables to the group and make sure that there are enough 
> worker processes for that group to prevent vacuum starvation.  Perhaps 
> we can create a dynamic group that includes all tables with less than a 
> certain number of rows or blocks?

Yeah, my idea of "queues" was slightly different than the queues that
were being discussed.  I was thinking that queues would just be a means
to group the groups to limit concurrency while at the same time prevent
starvation.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Autovacuum improvements
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Autovacuum improvements