Обсуждение: missing chunk 0 for toast value ...

Поиск
Список
Период
Сортировка

missing chunk 0 for toast value ...

От
Andrew Hammond
Дата:
I found the following error message in my logfiles.

Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] user=nutricate,db=admin_production ERROR:  missing chunk number 0 for toast value 2411466 in pg_toast_2619
Oct 24 04:05:57 db-app02 postgres[24640]: [2-2] user=nutricate,db=admin_production STATEMENT:  SELECT "devices"."id", "devices"."name", "devices"."info", "devices"."pos_id", "devices"."device_group_id", "devices"."header_id", "devices"."footer_id", "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id", "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version", "devices"."device_status", "devices"."entity_id", "devices"."address", "devices"."created_by", "devices"."create_method", "devices"."created_on", "devices"."modified_by", "devices"."updated_on", "devices"."version_id", "devices"."unique_id", "devices"."hw_key", "devices"."config_status", "devices"."activated_on", "devices"."store_id", "devices"."last_status_update", "devices"."loaded_on", "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id", "devices"."scanner_comm_id", "devices"."public_address", "devices"."hostname", "devices"."update_sw_version", "devices"."proxy_address", "devices"."proxy_type_id", "devices"."build_error",
Oct 24 04:05:57 db-app02 postgres[24640]: [2-3]  "devices"."local_info", "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" < 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )

I tried reproducing it with 

SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" < 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )

This returned no rows and did not reproduce the error. I tried reproducing with an even broader approach

CREATE TEMP TABLE foo AS SELECT * FROM devices;
CREATE TEMP TABLE bar AS SELECT * FROM entities;

Still no repro. This is the second time I've seen this error, and the last time it also did not reproduce. I'm wondering a couple of things. First, is there some other way to reproduce this error? Second, would simply deleting the row cause the problem to go away? I wouldn't think so, but why else is it not reproducing? Third, what is the recommended course of action here. We have hourly backups, but this is a production database, so I would prefer not to have to restore. I tested a restore from an hour ago and it ran successfully.

A

Re: missing chunk 0 for toast value ...

От
Merlin Moncure
Дата:
On Tue, Oct 25, 2011 at 3:47 PM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> I found the following error message in my logfiles.
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR:  missing chunk number 0
> for toast value 2411466 in pg_toast_2619
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-2]
> user=nutricate,db=admin_production STATEMENT:  SELECT "devices"."id",
> "devices"."name", "devices"."info", "devices"."pos_id",
> "devices"."device_group_id", "devices"."header_id", "devices"."footer_id",
> "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id",
> "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version",
> "devices"."device_status", "devices"."entity_id", "devices"."address",
> "devices"."created_by", "devices"."create_method", "devices"."created_on",
> "devices"."modified_by", "devices"."updated_on", "devices"."version_id",
> "devices"."unique_id", "devices"."hw_key", "devices"."config_status",
> "devices"."activated_on", "devices"."store_id",
> "devices"."last_status_update", "devices"."loaded_on",
> "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id",
> "devices"."scanner_comm_id", "devices"."public_address",
> "devices"."hostname", "devices"."update_sw_version",
> "devices"."proxy_address", "devices"."proxy_type_id",
> "devices"."build_error",
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-3]  "devices"."local_info",
> "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN
> (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" <
> 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )
> I tried reproducing it with
> SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM
> "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" < 2597 ) OR U0."id" = 140
> )) AND "devices"."auth_code" = E'0063' )
> This returned no rows and did not reproduce the error. I tried reproducing
> with an even broader approach
> CREATE TEMP TABLE foo AS SELECT * FROM devices;
> CREATE TEMP TABLE bar AS SELECT * FROM entities;
> Still no repro. This is the second time I've seen this error, and the last
> time it also did not reproduce. I'm wondering a couple of things. First, is
> there some other way to reproduce this error? Second, would simply deleting
> the row cause the problem to go away? I wouldn't think so, but why else is
> it not reproducing? Third, what is the recommended course of action here. We
> have hourly backups, but this is a production database, so I would prefer
> not to have to restore. I tested a restore from an hour ago and it ran
> successfully.

hrm, that's bizarre -- version etc?

merlin

Re: missing chunk 0 for toast value ...

От
Tom Lane
Дата:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR:  missing chunk number 0 for
> toast value
> 2411466 in pg_toast_2619

Hmm ... pg_toast_2619 is pg_statistic's toast table.  Is this 9.0.x,
and are you by any chance in the habit of running CLUSTER or VACUUM FULL
on your system catalogs?  Could one have been running in the background
when this happened?

            regards, tom lane

Re: missing chunk 0 for toast value ...

От
Andrew Hammond
Дата:
On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR:  missing chunk number 0 for
> toast value
> 2411466 in pg_toast_2619

Hmm ... pg_toast_2619 is pg_statistic's toast table.  Is this 9.0.x,
and are you by any chance in the habit of running CLUSTER or VACUUM FULL
on your system catalogs?  Could one have been running in the background
when this happened?

This is 9.0.4. 

I am not aware of any VACUUM FULL or CLUSTER operations that are scheduled. To the best of my knowledge this DB has never been either clustered or vacuum full'd. The following occur in the log file immediately before the error message above.

Oct 24 04:05:57 db-app02 postgres[24639]: [2-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_catalog.pg_statistic": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [2-2] #011pages: 0 removed, 150 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-3] #011tuples: 254 removed, 925 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-4] #011system usage: CPU 0.01s/0.00u sec elapsed 0.02 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [3-1] user=,db= LOG:  automatic vacuum of table "admin_production.public.devices": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [3-2] #011pages: 0 removed, 353 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-3] #011tuples: 824 removed, 2261 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.08 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [4-1] user=,db= LOG:  automatic analyze of table "admin_production.public.devices" system usage: CPU 0.00s/0.08u sec elapsed 0.17 sec

Re: missing chunk 0 for toast value ...

От
Tom Lane
Дата:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> The following occur in the log file immediately before the
> error message above.

Hmm, that's pretty interesting ... are there any nearby autovacuums of
pg_toast_2619?

            regards, tom lane

Re: missing chunk 0 for toast value ...

От
Andrew Hammond
Дата:
On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> The following occur in the log file immediately before the
> error message above.

Hmm, that's pretty interesting ... are there any nearby autovacuums of
pg_toast_2619?

Several, both before and after the error message:

Oct 24 03:49:57 db-app02 postgres[23554]: [4-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:49:57 db-app02 postgres[23554]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.10 sec

Oct 24 03:53:57 db-app02 postgres[23800]: [5-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:53:57 db-app02 postgres[23800]: [5-2] #011pages: 0 removed, 206 remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-3] #011tuples: 220 removed, 608 remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-4] #011system usage: CPU 0.01s/0.00u sec elapsed 0.12 sec

Oct 24 03:57:57 db-app02 postgres[24059]: [4-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:57:57 db-app02 postgres[24059]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.04 sec 

Oct 24 04:01:57 db-app02 postgres[24394]: [4-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:01:57 db-app02 postgres[24394]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-4] #011system usage: CPU 0.00s/0.01u sec elapsed 0.04 sec

Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] user=nutricate,db=admin_production ERROR:  missing chunk number 0 for toast value 2411466 in pg_toast_2619

Oct 24 04:05:57 db-app02 postgres[24639]: [6-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:05:57 db-app02 postgres[24639]: [6-2] #011pages: 0 removed, 206 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-3] #011tuples: 216 removed, 608 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.08 sec

Oct 24 04:09:57 db-app02 postgres[24877]: [5-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:09:57 db-app02 postgres[24877]: [5-2] #011pages: 0 removed, 206 remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-3] #011tuples: 220 removed, 608 remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.10 sec

Oct 24 04:13:57 db-app02 postgres[25116]: [6-1] user=,db= LOG:  automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:13:57 db-app02 postgres[25116]: [6-2] #011pages: 0 removed, 206 remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-3] #011tuples: 220 removed, 608 remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.12 sec

Re: missing chunk 0 for toast value ...

От
Tom Lane
Дата:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm, that's pretty interesting ... are there any nearby autovacuums of
>> pg_toast_2619?

> Several, both before and after the error message:

Well, it seems clear that somehow the vacuum deleted a toast tuple that
the other statement was about to fetch, but it's not clear how this
could be.  The pg_statistic fetch must have come from the planner, which
should always be called with a transaction snapshot established, and
that ought to protect it against vacuum deleting anything that could be
visible to SnapshotNow.  Weird.

[ pokes around for a bit ... ]  Hmm, can you say how the failing query
was submitted, exactly?  I'm wondering if it came in via simple Query
(PQexec) or extended-query protocol (anything with parameters).

            regards, tom lane

Re: missing chunk 0 for toast value ...

От
Andrew Hammond
Дата:
On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm, that's pretty interesting ... are there any nearby autovacuums of
>> pg_toast_2619?

> Several, both before and after the error message:

Well, it seems clear that somehow the vacuum deleted a toast tuple that
the other statement was about to fetch, but it's not clear how this
could be.  The pg_statistic fetch must have come from the planner, which
should always be called with a transaction snapshot established, and
that ought to protect it against vacuum deleting anything that could be
visible to SnapshotNow.  Weird.

[ pokes around for a bit ... ]  Hmm, can you say how the failing query
was submitted, exactly?  I'm wondering if it came in via simple Query
(PQexec) or extended-query protocol (anything with parameters).

The command was sent from a python via django-1.2.1 using psycopg2-2.2.1 

Andrew

Re: missing chunk 0 for toast value ...

От
Daniele Varrazzo
Дата:
On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> [ pokes around for a bit ... ]  Hmm, can you say how the failing query
>> was submitted, exactly?  I'm wondering if it came in via simple Query
>> (PQexec) or extended-query protocol (anything with parameters).
>
> The command was sent from a python via django-1.2.1 using psycopg2-2.2.1

PQexec then.

-- Daniele

Re: missing chunk 0 for toast value ...

От
Tom Lane
Дата:
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond
> <andrew.george.hammond@gmail.com> wrote:
>> On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> [ pokes around for a bit ... ] �Hmm, can you say how the failing query
>>> was submitted, exactly? �I'm wondering if it came in via simple Query
>>> (PQexec) or extended-query protocol (anything with parameters).

>> The command was sent from a python via django-1.2.1 using psycopg2-2.2.1

> PQexec then.

Thanks for the followup.  My current theory is that that's unrelated,
though --- it now seems that this is just a race condition in the use of
toasted pg_statistic entries.  See
http://archives.postgresql.org/pgsql-hackers/2011-10/msg01366.php

            regards, tom lane

Re: missing chunk 0 for toast value ...

От
Tom Lane
Дата:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm ... pg_toast_2619 is pg_statistic's toast table.  Is this 9.0.x,
>> and are you by any chance in the habit of running CLUSTER or VACUUM FULL
>> on your system catalogs?  Could one have been running in the background
>> when this happened?

> This is 9.0.4.

I've applied a patch for this:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f797d27fe0be5200ad5fd5af6cefcee30c8e24a

It turns out that the problem is very significantly harder to reproduce
pre-9.0, which may explain why you hadn't seen it before ... but it does
exist, at least as far back as 8.2, and probably in every version that
has TOAST tables.

            regards, tom lane