Re: [HACKERS] Checksums by default?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] Checksums by default?
Дата
Msg-id CAM3SWZTc+4QjysO-Op4ui8hZJro3QG0fN1MFj1NtMuVHQY1sew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Checksums by default?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Checksums by default?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers
On Sat, Jan 21, 2017 at 9:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Not at all; I just think that it's not clear that they are a net win
> for the average user, and so I'm unconvinced that turning them on by
> default is a good idea.  I could be convinced otherwise by suitable
> evidence.  What I'm objecting to is turning them on without making
> any effort to collect such evidence.

+1

One insight Jim Gray has in the classic paper "Why Do Computers Stop
and What Can Be Done About It?" [1] is that fault-tolerant hardware is
table stakes, and so most failures are related to operator error, and
to a lesser extent software bugs. The paper is about 30 years old.

I don't recall ever seeing a checksum failure on a Heroku Postgres
database, even though they were enabled as soon as the feature became
available. I have seen a few corruption problems brought to light by
amcheck, though, all of which were due to bugs in software.
Apparently, before I joined Heroku there were real reliability
problems with the storage subsystem that Heroku Postgres runs on (it's
a pluggable storage service from a popular cloud provider -- the
"pluggable" functionality would have made it fairly novel at the
time). These problems were something that the Heroku Postgres team
dealt with about 6 years ago. However, anecdotal evidence suggests
that the reliability of the same storage system *vastly* improved
roughly a year or two later. We still occasionally lose drives, but
drives seem to fail fast in a fashion that lets us recover without
data loss easily. In practice, Postgres checksums do *not* seem to
catch problems. That's been my experience, at least.

Obviously every additional check helps, and it may be something we can
do without any appreciable downside. I'd like to see a benchmark.

[1] http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: [HACKERS] GSoC 2017
Следующее
От: Jia Yu
Дата:
Сообщение: Re: [HACKERS] IndexBuild Function call fcinfo cannot access memory