Re: Enabling Checksums

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Enabling Checksums
Дата
Msg-id 50D031FA.6030307@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Enabling Checksums
Re: Enabling Checksums
Список pgsql-hackers
On 12/18/12 3:17 AM, Simon Riggs wrote:
> Clearly part of the response could involve pg_dump on the damaged
> structure, at some point.

This is the main thing I wanted to try out more, once I have a decent 
corruption generation tool.  If you've corrupted a single record but can 
still pg_dump the remainder, that seems the best we can do to help 
people recover from that.  Providing some documentation on how to figure 
out what rows are in that block, presumably by using the contrib 
inspection tools, would be helpful too.

> Indexes are a good case, because we can/should report the block error, mark the
> index as invalid and then hint that it should be rebuilt.

Marking a whole index invalid because there's one bad entry has enough 
downsides that I'm not sure how much we'd want to automate that.  Not 
having that index available could easily result in an effectively down 
system due to low performance.  The choices are uglier if it's backing a 
unique constraint.

In general, what I hope people will be able to do is switch over to 
their standby server, and then investigate further.  I think it's 
unlikely that people willing to pay for block checksums will only have 
one server.  Having some way to nail down if the same block is bad on a 
given standby seems like a useful interface we should offer, and it 
shouldn't take too much work.  Ideally you won't find the same 
corruption there.  I'd like a way to check the entirety of a standby for 
checksum issues, ideally run right after it becomes current.  It seems 
the most likely way to see corruption on one of those is to replicate a 
corrupt block.

There is no good way to make the poor soul who has no standby server 
happy here.  You're just choosing between bad alternatives.  The first 
block error is often just that--the first one, to be joined by others 
soon afterward.  My experience at how drives fail says the second error 
is a lot more likely after you've seen one.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Erik Rijkers"
Дата:
Сообщение: Re: WIP: index support for regexp search
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: WIP: index support for regexp search