Обсуждение: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

Поиск
Список
Период
Сортировка

incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:

Hello,


I see incorrect checksum detected on "global/pg_filenode.map" when "VACUUM FULL" is executed.

The error message didn't repeat. It showed up only once.

Is this expected? Can someone give me a plausible scenario why this happened?


Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file "global/pg_filenode.map" contains incorrect checksum


It looks like it's pretty rare and I have seen it only once at 2 different machines.


PosgreSQL version is 9.2.4.


Regards,

Tatsuki

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
John R Pierce
Дата:
On 8/21/2016 8:27 PM, Tatsuki Kadomoto wrote:


I see incorrect checksum detected on "global/pg_filenode.map" when "VACUUM FULL" is executed.

The error message didn't repeat. It showed up only once.

Is this expected? Can someone give me a plausible scenario why this happened?


Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file "global/pg_filenode.map" contains incorrect checksum


It looks like it's pretty rare and I have seen it only once at 2 different machines.


PosgreSQL version is 9.2.4.



that suggests unreliable hardware.  does this server not have ECC memory ?


-- 
john r pierce, recycling bits in santa cruz

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 12:27 PM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> I see incorrect checksum detected on "global/pg_filenode.map" when "VACUUM
> FULL" is executed.
>
> The error message didn't repeat. It showed up only once.
>
> Is this expected? Can someone give me a plausible scenario why this
> happened?
> Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file
> "global/pg_filenode.map" contains incorrect checksum
> It looks like it's pretty rare and I have seen it only once at 2 different
> machines.

You are facing a corruption:
https://wiki.postgresql.org/wiki/Corruption
Be sure that you have a clean backup of your database first!

> PosgreSQL version is 9.2.4.

You are missing a couple of years worth of bug fixes here..
--
Michael


Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
John R Pierce
Дата:
On 8/21/2016 8:37 PM, Michael Paquier wrote:
PosgreSQL version is 9.2.4.
You are missing a couple of years worth of bug fixes here..

3+ years, to be more specific, 9.2.4 was released in april 2013.

indeed, and several of the bug fixes involved data corruption.   current is 9.2.18


-- 
john r pierce, recycling bits in santa cruz

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:

John, Michael,


Thanks. The server is Dell PowerEdge R720.


The checksum error only was reported only once so I guess it was automatically fixed immediately.


The log was output only one time while the box was running very long time (> 1 year.).

I have hundreds of the machines with same configuration running very long time as well,

but the incident was only once.


Can we point out a specific bug that can lead to this?

Or HW issue is the most probable cause?

I don't see any HW error logs.


Regards,

Tatsuki


From: pgsql-general-owner@postgresql.org <pgsql-general-owner@postgresql.org> on behalf of John R Pierce <pierce@hogranch.com>
Sent: Monday, August 22, 2016 12:42:41 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed
 
On 8/21/2016 8:37 PM, Michael Paquier wrote:
PosgreSQL version is 9.2.4.
You are missing a couple of years worth of bug fixes here..

3+ years, to be more specific, 9.2.4 was released in april 2013.

indeed, and several of the bug fixes involved data corruption.   current is 9.2.18


-- 
john r pierce, recycling bits in santa cruz

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
John R Pierce
Дата:
On 8/21/2016 9:13 PM, Tatsuki Kadomoto wrote:
> Can we point out a specific bug that can lead to this?

9.2.6 fixed several data corruption bugs,
https://www.postgresql.org/docs/current/static/release-9-2-6.html

9.2.9 fixed a GiST index corruption problem...
https://www.postgresql.org/docs/current/static/release-9-2-9.html

I would upgrade to 9.2.18, the latest 9.2 version,
https://www.postgresql.org/docs/current/static/release-9-2-18.html


--
john r pierce, recycling bits in santa cruz



Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 1:31 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 8/21/2016 9:13 PM, Tatsuki Kadomoto wrote:
>>
>> Can we point out a specific bug that can lead to this?
>
>
> 9.2.6 fixed several data corruption bugs,
> https://www.postgresql.org/docs/current/static/release-9-2-6.html
>
> 9.2.9 fixed a GiST index corruption problem...
> https://www.postgresql.org/docs/current/static/release-9-2-9.html
>
> I would upgrade to 9.2.18, the latest 9.2 version,
> https://www.postgresql.org/docs/current/static/release-9-2-18.html

Yep, this is mandatory. You are taking a lot of risks here by only using 9.2.4.
--
Michael


Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:

John, Michael,


Thanks for suggestion for upgrade. I know that's the way to go, but it's not so easy due to circumstances on my side.


Meanwhile, could you tell me what is this "global/pg_filenode.map" for?


Regards,

Tatsuki


From: pgsql-general-owner@postgresql.org <pgsql-general-owner@postgresql.org> on behalf of Michael Paquier <michael.paquier@gmail.com>
Sent: Monday, August 22, 2016 1:34:30 PM
To: John R Pierce
Cc: PostgreSQL mailing lists
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed
 
On Mon, Aug 22, 2016 at 1:31 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 8/21/2016 9:13 PM, Tatsuki Kadomoto wrote:
>>
>> Can we point out a specific bug that can lead to this?
>
>
> 9.2.6 fixed several data corruption bugs,
> https://www.postgresql.org/docs/current/static/release-9-2-6.html
>
> 9.2.9 fixed a GiST index corruption problem...
> https://www.postgresql.org/docs/current/static/release-9-2-9.html
>
> I would upgrade to 9.2.18, the latest 9.2 version,
> https://www.postgresql.org/docs/current/static/release-9-2-18.html

Yep, this is mandatory. You are taking a lot of risks here by only using 9.2.4.
--
Michael


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
> so easy due to circumstances on my side.

Well, I guess it depends on how much you care about your data.

> Meanwhile, could you tell me what is this "global/pg_filenode.map" for?

It is a relation map file referring a list of OID -> relfilenode. You
can look at the comments on top src/backend/utils/cache/relmapper.c
for more details.
--
Michael


Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:

Michael,


Thank you very much.


Regards,

Tatsuki


From: Michael Paquier <michael.paquier@gmail.com>
Sent: Monday, August 22, 2016 5:02:40 PM
To: Tatsuki Kadomoto
Cc: John R Pierce; PostgreSQL mailing lists
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed
 
On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
> so easy due to circumstances on my side.

Well, I guess it depends on how much you care about your data.

> Meanwhile, could you tell me what is this "global/pg_filenode.map" for?

It is a relation map file referring a list of OID -> relfilenode. You
can look at the comments on top src/backend/utils/cache/relmapper.c
for more details.
--
Michael

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Kevin Grittner
Дата:
On Mon, Aug 22, 2016 at 3:02 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
> <tatsuki.kadomoto@proceranetworks.com> wrote:
>> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
>> so easy due to circumstances on my side.
>
> Well, I guess it depends on how much you care about your data.

Right.  Make sure that whoever is responsible for this decision
knows that until they upgrade they are running with known bugs
which could render the database unusable without warning.  It
should at least be an informed decision so that the decision-maker
can stand behind it and feel as good as possible about
circumstances should that happen.

You might want to keep a copy of the email or memo in which you
point this out, in case anyone's memory gets foggy during such a
crisis.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:

Michael,


I have read the comment in relmapper.c.


===

Therefore mapped catalogs can only be relocated by operations such as VACUUM FULL
and CLUSTER, which make no transactionally-significant changes: it must be
safe for the new file to replace the old, even if the transaction itself aborts.
===


Does this mean it's a kind of expected to face checksum error when the mapping file is relocated by VACUUM FULL?

I faced the checksum error exactly when VACUUM FULL was running and it has never been detected until now.

Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file "global/pg_filenode.map" contains incorrect checksum
Aug 16 20:51:19 postgres[22329]: [2-2] STATEMENT:  SELECT id,readbm,writebm,survbm,timeout FROM Users WHERE username='packetlogicd' AND password=md5('xxxxx')


Regards,

Tatsuki


From: Kevin Grittner <kgrittn@gmail.com>
Sent: Tuesday, August 23, 2016 3:07:03 AM
To: Michael Paquier
Cc: Tatsuki Kadomoto; John R Pierce; PostgreSQL mailing lists
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed
 
On Mon, Aug 22, 2016 at 3:02 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
> <tatsuki.kadomoto@proceranetworks.com> wrote:
>> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
>> so easy due to circumstances on my side.
>
> Well, I guess it depends on how much you care about your data.

Right.  Make sure that whoever is responsible for this decision
knows that until they upgrade they are running with known bugs
which could render the database unusable without warning.  It
should at least be an informed decision so that the decision-maker
can stand behind it and feel as good as possible about
circumstances should that happen.

You might want to keep a copy of the email or memo in which you
point this out, in case anyone's memory gets foggy during such a
crisis.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
www.enterprisedb.com
A relational database management system based on PostgreSQL that adds Oracle compatibility feature, performance and management features.


The Enterprise PostgreSQL Company

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Michael Paquier
Дата:
On Thu, Aug 25, 2016 at 9:48 PM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> Does this mean it's a kind of expected to face checksum error when the mapping file is relocated by VACUUM FULL?
> I faced the checksum error exactly when VACUUM FULL was running and it has never been detected until now.

No, it's not. This should not happen. And this means that your data
folder is facing some corruption.
--
Michael


Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Tatsuki Kadomoto
Дата:
Michael,

In my case, it only detected once.
Does Postgre have any mechanism to correct the mapping file automatically?

Regards,
Tatsuki

Sent from my ASUS


-------- オリジナルのメッセージ --------
送信元:Michael Paquier
送信済み:Fri, 26 Aug 2016 09:05:53 +0900
送信先:Tatsuki Kadomoto
Cc:Kevin Grittner ,John R Pierce ,PostgreSQL mailing lists
件名:Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

On Thu, Aug 25, 2016 at 9:48 PM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> Does this mean it's a kind of expected to face checksum error when the mapping file is relocated by VACUUM FULL?
> I faced the checksum error exactly when VACUUM FULL was running and it has never been detected until now.

No, it's not. This should not happen. And this means that your data
folder is facing some corruption.
--
Michael

Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

От
Michael Paquier
Дата:
On Fri, Aug 26, 2016 at 9:09 AM, Tatsuki Kadomoto
<tatsuki.kadomoto@proceranetworks.com> wrote:
> Does Postgre have any mechanism to correct the mapping file automatically?

When the relation mapping changes, the file gets rewritten at
transaction commit. I guess that the error you saw previously (be
careful, that's a corruption!) got clobbered afterwards.
--
Michael