Re: Enabling Checksums

Поиск

Список

Период

Сортировка

От	Greg Smith
Тема	Re: Enabling Checksums
Дата	18 декабря 2012 г. 09:06:22
Msg-id	50D031FA.6030307@2ndQuadrant.com обсуждение исходный текст
Ответ на	Re: Enabling Checksums (Simon Riggs <simon@2ndQuadrant.com>)
Ответы	Re: Enabling Checksums Re: Enabling Checksums
Список	pgsql-hackers

Дерево обсуждения

On 12/18/12 3:17 AM, Simon Riggs wrote:
> Clearly part of the response could involve pg_dump on the damaged
> structure, at some point.

This is the main thing I wanted to try out more, once I have a decent 
corruption generation tool.  If you've corrupted a single record but can 
still pg_dump the remainder, that seems the best we can do to help 
people recover from that.  Providing some documentation on how to figure 
out what rows are in that block, presumably by using the contrib 
inspection tools, would be helpful too.

> Indexes are a good case, because we can/should report the block error, mark the
> index as invalid and then hint that it should be rebuilt.

Marking a whole index invalid because there's one bad entry has enough 
downsides that I'm not sure how much we'd want to automate that.  Not 
having that index available could easily result in an effectively down 
system due to low performance.  The choices are uglier if it's backing a 
unique constraint.

In general, what I hope people will be able to do is switch over to 
their standby server, and then investigate further.  I think it's 
unlikely that people willing to pay for block checksums will only have 
one server.  Having some way to nail down if the same block is bad on a 
given standby seems like a useful interface we should offer, and it 
shouldn't take too much work.  Ideally you won't find the same 
corruption there.  I'd like a way to check the entirety of a standby for 
checksum issues, ideally run right after it becomes current.  It seems 
the most likely way to see corruption on one of those is to replicate a 
corrupt block.

There is no good way to make the poor soul who has no standby server 
happy here.  You're just choosing between bad alternatives.  The first 
block error is often just that--the first one, to be joined by others 
soon afterward.  My experience at how drives fail says the second error 
is a lot more likely after you've seen one.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Enabling Checksums