Обсуждение: zero_damaged_pages doesn't work

Поиск
Список
Период
Сортировка

zero_damaged_pages doesn't work

От
David Boreham
Дата:
Is the zero_damaged_pages feature expected to work in 8.3.11 ?

I have a fair bit of evidence that it doesn't (you get nice messages
in saying that the page is being zeroed, but the on-disk data does not
change).
I also see quite a few folk reporting similar findings in various form
and mailing list posts over the past few years.

I can use dd to zero the on-disk data, but it'd be nice to know
definitively if this feature is expected to work, and if so under
what conditions it might not.

fwiw I am enabling zero_damaged_pages using a set command
in a client session, not in the server's config file. The symptoms
I observe are that a query that previously errored out due to
a bad page header error will succeed when zero_damaged_pages
is enabled, the log says that the page is being zeroed.
However the same query run subsequently without zero_damaged_pages
will again fail, and pg_filedump shows that the on-disk data
hasn't changed.

Thanks.



Re: zero_damaged_pages doesn't work

От
Jeff Davis
Дата:
On Mon, 2010-09-27 at 15:07 -0600, David Boreham wrote:
> Is the zero_damaged_pages feature expected to work in 8.3.11 ?
>
> I have a fair bit of evidence that it doesn't (you get nice messages
> in saying that the page is being zeroed, but the on-disk data does not
> change).
> I also see quite a few folk reporting similar findings in various form
> and mailing list posts over the past few years.
>
> I can use dd to zero the on-disk data, but it'd be nice to know
> definitively if this feature is expected to work, and if so under
> what conditions it might not.

It does zero the page in the buffer, but I don't think it marks it as
dirty. So, it never really makes it to disk as all-zeros.

> fwiw I am enabling zero_damaged_pages using a set command
> in a client session, not in the server's config file. The symptoms
> I observe are that a query that previously errored out due to
> a bad page header error will succeed when zero_damaged_pages
> is enabled, the log says that the page is being zeroed.
> However the same query run subsequently without zero_damaged_pages
> will again fail, and pg_filedump shows that the on-disk data
> hasn't changed.

The subsequent queries may succeed if the page is still in the buffer
cache.

zero_damaged_pages is not meant as a recovery tool. It's meant to allow
you to pg_dump whatever data is not damaged, so that you can restore
into a fresh location.

Regards,
    Jeff Davis


Re: zero_damaged_pages doesn't work

От
David Boreham
Дата:
  On 9/27/2010 4:40 PM, Jeff Davis wrote:
> It does zero the page in the buffer, but I don't think it marks it as
> dirty. So, it never really makes it to disk as all-zeros.

Ah ha ! This is certainly consistent with the observed behavior.

> zero_damaged_pages is not meant as a recovery tool. It's meant to allow
> you to pg_dump whatever data is not damaged, so that you can restore
> into a fresh location.

It'd be useful for future generations if this were included in the doc.

The latest version :
http://www.postgresql.org/docs/9.0/static/runtime-config-developer.html
still talks about destroying data (which at least to me implies a
persistent change
to the on-disk bits) and fails to mention that the zeroing only occurs
in the
page pool sans write-back.

If it helps, I'd be happy to contribute some time to fix up the docs,
but imho a simple
copy/paste of your text above would be sufficient.



Re: zero_damaged_pages doesn't work

От
Tom Lane
Дата:
David Boreham <david_list@boreham.org> writes:
>   On 9/27/2010 4:40 PM, Jeff Davis wrote:
>> zero_damaged_pages is not meant as a recovery tool. It's meant to allow
>> you to pg_dump whatever data is not damaged, so that you can restore
>> into a fresh location.

> It'd be useful for future generations if this were included in the doc.

> The latest version :
> http://www.postgresql.org/docs/9.0/static/runtime-config-developer.html
> still talks about destroying data (which at least to me implies a
> persistent change to the on-disk bits) and fails to mention that the
> zeroing only occurs in the page pool sans write-back.

The reason it tells you that data will be destroyed is that that could
very well happen.  If the system decides to put new data into what will
appear to it to be an empty page, then the damaged data on disk will be
overwritten, and then there's no hope of recovering anything.

Like Jeff said, this is not a recovery tool.  It's certainly not meant
to be something that you keep turned on for any length of time, and so
the possibility of repeat messages is really not a design consideration
at all.

            regards, tom lane

Re: zero_damaged_pages doesn't work

От
David Boreham
Дата:
  On 9/27/2010 4:53 PM, Tom Lane wrote:
> The reason it tells you that data will be destroyed is that that could
> very well happen.  If the system decides to put new data into what will
> appear to it to be an empty page, then the damaged data on disk will be
> overwritten, and then there's no hope of recovering anything.
>
> Like Jeff said, this is not a recovery tool.  It's certainly not meant
> to be something that you keep turned on for any length of time, and so
> the possibility of repeat messages is really not a design consideration
> at all.

No argument with any of this, although I'm not the intended audience for
these warnings -- I know what I'm doing ;)

I'm not sure though if you're disagreeing with my
suggestion that the documentation be improved/corrected though.
Is that the case ? (if so then I will argue)




Re: zero_damaged_pages doesn't work

От
David Boreham
Дата:
  On 9/27/2010 4:53 PM, Tom Lane wrote:
> The reason it tells you that data will be destroyed is that that could
> very well happen.

Re-parsing this, I think there was a mis-communication :

I'm not at all suggesting that the doc should _not_ say that data will
be corrupted.
I'm suggesting that in addition to what it currently says, it also
should say that the on-disk data won't be
changed by the page zeroing mode.

In my searching I found countless people over the past few years who had
been similarly confused into believing that it would write back the
zeroed page
to disk.





Re: zero_damaged_pages doesn't work

От
Bruce Momjian
Дата:
David Boreham wrote:
>   On 9/27/2010 4:53 PM, Tom Lane wrote:
> > The reason it tells you that data will be destroyed is that that could
> > very well happen.
>
> Re-parsing this, I think there was a mis-communication :
>
> I'm not at all suggesting that the doc should _not_ say that data will
> be corrupted.
> I'm suggesting that in addition to what it currently says, it also
> should say that the on-disk data won't be
> changed by the page zeroing mode.
>
> In my searching I found countless people over the past few years who had
> been similarly confused into believing that it would write back the
> zeroed page
> to disk.

Based on this discussion from September, I have applied the attached
documentation patch to clarify that zero_damaged_pages are not forced to
disk, and when to set this parameter off again.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3a0f755..141430c 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** LOG:  CleanUpLock: deleting: lock(0xb7ac
*** 6059,6073 ****
         <para>
          Detection of a damaged page header normally causes
          <productname>PostgreSQL</> to report an error, aborting the current
!         command.  Setting <varname>zero_damaged_pages</> to on causes
!         the system to instead report a warning, zero out the damaged page,
!         and continue processing.  This behavior <emphasis>will destroy data</>,
!         namely all the rows on the damaged page.  But it allows you to get
          past the error and retrieve rows from any undamaged pages that might
!         be present in the table.  So it is useful for recovering data if
          corruption has occurred due to a hardware or software error.  You should
          generally not set this on until you have given up hope of recovering
!         data from the damaged pages of a table.  The
          default setting is <literal>off</>, and it can only be changed
          by a superuser.
         </para>
--- 6059,6075 ----
         <para>
          Detection of a damaged page header normally causes
          <productname>PostgreSQL</> to report an error, aborting the current
!         transaction.  Setting <varname>zero_damaged_pages</> to on causes
!         the system to instead report a warning, zero out the damaged
!         page in memory, and continue processing.  This behavior <emphasis>will destroy data</>,
!         namely all the rows on the damaged page.  However, it does allow you to get
          past the error and retrieve rows from any undamaged pages that might
!         be present in the table.  It is useful for recovering data if
          corruption has occurred due to a hardware or software error.  You should
          generally not set this on until you have given up hope of recovering
!         data from the damaged pages of a table.  Zerod-out pages are not
!         forced to disk so it is recommended to recreate the table or
!         the index before turning this parameter off again.  The
          default setting is <literal>off</>, and it can only be changed
          by a superuser.
         </para>