Re: corrupt pages detected by enabling checksums

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: corrupt pages detected by enabling checksums
Дата
Msg-id 20130513134922.GB27618@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: corrupt pages detected by enabling checksums  (Jon Nelson <jnelson+pgsql@jamponi.net>)
Ответы Re: corrupt pages detected by enabling checksums
Список pgsql-hackers
On 2013-05-13 08:45:41 -0500, Jon Nelson wrote:
> On Mon, May 13, 2013 at 8:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-05-12 19:41:26 -0500, Jon Nelson wrote:
> >> On Sun, May 12, 2013 at 3:46 PM, Jim Nasby <jim@nasby.net> wrote:
> >> > On 5/10/13 1:06 PM, Jeff Janes wrote:
> >> >>
> >> >> Of course the paranoid DBA could turn off restart_after_crash and do a
> >> >> manual investigation on every crash, but in that case the database would
> >> >> refuse to restart even in the case where it perfectly clear that all the
> >> >> following WAL belongs to the recycled file and not the current file.
> >> >
> >> >
> >> > Perhaps we should also allow for zeroing out WAL files before reuse (or just
> >> > disable reuse). I know there's a performance hit there, but the reuse idea
> >> > happened before we had bgWriter. Theoretically the overhead creating a new
> >> > file would always fall to bgWriter and therefore not be a big deal.
> >>
> >> For filesystems like btrfs, re-using a WAL file is suboptimal to
> >> simply creating a new one and removing the old one when it's no longer
> >> required. Using fallocate (or posix_fallocate) (I have a patch for
> >> that!) to create a new one is - by my tests - 28 times faster than the
> >> currently-used method.
> >
> > I don't think the comparison between just fallocate()ing and what we
> > currently do is fair. fallocate() doesn't guarantee that the file is the
> > same size after a crash, so you would still need an fsync() or we
> > couldn't use fdatasync() anymore. And I'd guess the benefits aren't all
> > that big anymore in that case?
> 
> fallocate (16MB) + fsync is still almost certainly faster than
> write+write+write... + fsync.
> The test I performed at the time did exactly that .. posix_fallocate + pg_fsync.
Sure, the initial file creation will be faster. But are the actual
individual wal writes (small, frequently fdatasync()ed) still faster?
That's the critical path currently.
Whether it is pretty much depends on how the filesystem manages
allocated but not initialized blocks...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Salter
Дата:
Сообщение: Re: lock support for aarch64
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4