Обсуждение: buildfarm housekeeping / planning

Поиск
Список
Период
Сортировка

buildfarm housekeeping / planning

От
Andrew Dunstan
Дата:
The buildfarm is now going on six years old (time flies when you're 
having fun!) and the database is now rather large - around 76Gb on disk. 
We'd like to reduce that quite a lot, especially by purging out the logs 
of old builds. And while the old data isn't publicly accessible, it has 
occasionally been used to run specialised queries to research particular 
issues. It's also arguably a useful historical resource that shouldn't 
be lightly abandoned.

I'd like to get an idea of what the community regards as a reasonable 
amount of data to keep online and readily handy? Six months worth? A 
year? two years? Is it worth keeping logs of error stages longer than 
successful stages? If so, what should the periods be?

One of the things that I'd like to be able to do is FTS on the logs. 
Part of our plan is to move to a much more modern version of Postgres. 
Keeping the logs to a reasonable size will possibly allow us to provide 
FTS, although I haven't discussed that part with Josh Drake yet, and as 
it's hosted at CMD he does get a say :-)

cheers

andrew




Re: buildfarm housekeeping / planning

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> The buildfarm is now going on six years old (time flies when you're 
> having fun!) and the database is now rather large - around 76Gb on disk. 
> We'd like to reduce that quite a lot, especially by purging out the logs 
> of old builds. And while the old data isn't publicly accessible, it has 
> occasionally been used to run specialised queries to research particular 
> issues. It's also arguably a useful historical resource that shouldn't 
> be lightly abandoned.

As long as the historical data is kept somewhere, I agree that it
doesn't need to be readily available on-line.  10GB a year is not a lot
of data these days, so it seems like we ought to be able to archive it
indefinitely; but I can see that keeping it available on the web might
run into some money.  (You could also argue that there's no need to
archive more than say five years back, but I think that's a different
discussion.)

> I'd like to get an idea of what the community regards as a reasonable 
> amount of data to keep online and readily handy? Six months worth? A 
> year? two years? Is it worth keeping logs of error stages longer than 
> successful stages? If so, what should the periods be?

Six months is probably plenty, really, especially if that means we can
make the data more available than it is now.  I'm not convinced that
"successful" builds should be purged more quickly, as there's often
reason to look for warnings, funny events in the postmaster log, etc.

> One of the things that I'd like to be able to do is FTS on the logs. 

+1.  +10 even.  I think this'd be a quantum jump in the usefulness of
the log archives.  I frequently wonder things like "what other machines
are showing this warning", and right now it's impractical to research
that.
        regards, tom lane