Re: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Дата
Msg-id 11099.1342571512@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Craig Ringer <ringerc@ringerc.id.au>)
Ответы Re: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-hackers
Craig Ringer <ringerc@ringerc.id.au> writes:
> On 07/18/2012 06:56 AM, Tom Lane wrote:
>> This implies that nobody has done pull-the-plug testing on either HEAD
>> or 9.2 since the checkpointer split went in (2011-11-01)

> That makes me wonder if on top of the buildfarm, extending some 
> buildfarm machines into a "crashfarm" is needed:

Not sure if we need a whole "farm", but certainly having at least one
machine testing this sort of stuff on a regular basis would make me feel
a lot better.

> The main challenge would be coming up with suitable tests to run, ones 
> that could then be checked to make sure nothing was broken.

One fairly simple test scenario could go like this:
* run the regression tests* pg_dump the regression database* run the regression tests again* hard-kill immediately upon
completion*restart database, allow it to perform recovery* pg_dump the regression database* diff previous and new
dumps;should be the same
 

The main thing this wouldn't cover is discrepancies in user indexes,
since pg_dump doesn't do anything that's likely to result in indexscans
on user tables.  It ought to be enough to detect the sort of system-wide
problem we're talking about here, though.

In general I think the hard part is automated reproduction of an
OS-crash scenario, but your ideas about how to do that sound promising.
Once we have that going, it shouldn't be hard to come up with tests
of the form "do X, hard-crash, recover, check X still looks sane".

> What else should be checked? The main thing that comes to mind for me is 
> something I've worried about for a while: that Pg might not always 
> handle out-of-disk-space anywhere near as gracefully as it's often 
> claimed to.

+1
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Следующее
От: "xu2002261"
Дата:
Сообщение: During Xlog replaying, is there maybe emitted xlog?