Обсуждение: missing chunk 0 for toast value ...
I found the following error message in my logfiles.
Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] user=nutricate,db=admin_production ERROR: missing chunk number 0 for toast value 2411466 in pg_toast_2619
Oct 24 04:05:57 db-app02 postgres[24640]: [2-2] user=nutricate,db=admin_production STATEMENT: SELECT "devices"."id", "devices"."name", "devices"."info", "devices"."pos_id", "devices"."device_group_id", "devices"."header_id", "devices"."footer_id", "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id", "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version", "devices"."device_status", "devices"."entity_id", "devices"."address", "devices"."created_by", "devices"."create_method", "devices"."created_on", "devices"."modified_by", "devices"."updated_on", "devices"."version_id", "devices"."unique_id", "devices"."hw_key", "devices"."config_status", "devices"."activated_on", "devices"."store_id", "devices"."last_status_update", "devices"."loaded_on", "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id", "devices"."scanner_comm_id", "devices"."public_address", "devices"."hostname", "devices"."update_sw_version", "devices"."proxy_address", "devices"."proxy_type_id", "devices"."build_error",
Oct 24 04:05:57 db-app02 postgres[24640]: [2-3] "devices"."local_info", "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" < 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )
I tried reproducing it with
SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" < 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )
This returned no rows and did not reproduce the error. I tried reproducing with an even broader approach
CREATE TEMP TABLE foo AS SELECT * FROM devices;
CREATE TEMP TABLE bar AS SELECT * FROM entities;
Still no repro. This is the second time I've seen this error, and the last time it also did not reproduce. I'm wondering a couple of things. First, is there some other way to reproduce this error? Second, would simply deleting the row cause the problem to go away? I wouldn't think so, but why else is it not reproducing? Third, what is the recommended course of action here. We have hourly backups, but this is a production database, so I would prefer not to have to restore. I tested a restore from an hour ago and it ran successfully.
A
On Tue, Oct 25, 2011 at 3:47 PM, Andrew Hammond <andrew.george.hammond@gmail.com> wrote: > I found the following error message in my logfiles. > Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] > user=nutricate,db=admin_production ERROR: missing chunk number 0 > for toast value 2411466 in pg_toast_2619 > Oct 24 04:05:57 db-app02 postgres[24640]: [2-2] > user=nutricate,db=admin_production STATEMENT: SELECT "devices"."id", > "devices"."name", "devices"."info", "devices"."pos_id", > "devices"."device_group_id", "devices"."header_id", "devices"."footer_id", > "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id", > "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version", > "devices"."device_status", "devices"."entity_id", "devices"."address", > "devices"."created_by", "devices"."create_method", "devices"."created_on", > "devices"."modified_by", "devices"."updated_on", "devices"."version_id", > "devices"."unique_id", "devices"."hw_key", "devices"."config_status", > "devices"."activated_on", "devices"."store_id", > "devices"."last_status_update", "devices"."loaded_on", > "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id", > "devices"."scanner_comm_id", "devices"."public_address", > "devices"."hostname", "devices"."update_sw_version", > "devices"."proxy_address", "devices"."proxy_type_id", > "devices"."build_error", > Oct 24 04:05:57 db-app02 postgres[24640]: [2-3] "devices"."local_info", > "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN > (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" < > 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' ) > I tried reproducing it with > SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM > "entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" < 2597 ) OR U0."id" = 140 > )) AND "devices"."auth_code" = E'0063' ) > This returned no rows and did not reproduce the error. I tried reproducing > with an even broader approach > CREATE TEMP TABLE foo AS SELECT * FROM devices; > CREATE TEMP TABLE bar AS SELECT * FROM entities; > Still no repro. This is the second time I've seen this error, and the last > time it also did not reproduce. I'm wondering a couple of things. First, is > there some other way to reproduce this error? Second, would simply deleting > the row cause the problem to go away? I wouldn't think so, but why else is > it not reproducing? Third, what is the recommended course of action here. We > have hourly backups, but this is a production database, so I would prefer > not to have to restore. I tested a restore from an hour ago and it ran > successfully. hrm, that's bizarre -- version etc? merlin
Andrew Hammond <andrew.george.hammond@gmail.com> writes: > Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] > user=nutricate,db=admin_production ERROR: missing chunk number 0 for > toast value > 2411466 in pg_toast_2619 Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x, and are you by any chance in the habit of running CLUSTER or VACUUM FULL on your system catalogs? Could one have been running in the background when this happened? regards, tom lane
On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x,
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR: missing chunk number 0 for
> toast value
> 2411466 in pg_toast_2619
and are you by any chance in the habit of running CLUSTER or VACUUM FULL
on your system catalogs? Could one have been running in the background
when this happened?
This is 9.0.4.
I am not aware of any VACUUM FULL or CLUSTER operations that are scheduled. To the best of my knowledge this DB has never been either clustered or vacuum full'd. The following occur in the log file immediately before the error message above.
Oct 24 04:05:57 db-app02 postgres[24639]: [2-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_catalog.pg_statistic": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [2-2] #011pages: 0 removed, 150 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-3] #011tuples: 254 removed, 925 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-4] #011system usage: CPU 0.01s/0.00u sec elapsed 0.02 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [3-1] user=,db= LOG: automatic vacuum of table "admin_production.public.devices": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [3-2] #011pages: 0 removed, 353 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-3] #011tuples: 824 removed, 2261 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.08 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [4-1] user=,db= LOG: automatic analyze of table "admin_production.public.devices" system usage: CPU 0.00s/0.08u sec elapsed 0.17 sec
Andrew Hammond <andrew.george.hammond@gmail.com> writes: > The following occur in the log file immediately before the > error message above. Hmm, that's pretty interesting ... are there any nearby autovacuums of pg_toast_2619? regards, tom lane
On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:Hmm, that's pretty interesting ... are there any nearby autovacuums of
> The following occur in the log file immediately before the
> error message above.
pg_toast_2619?
Several, both before and after the error message:
Oct 24 03:49:57 db-app02 postgres[23554]: [4-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:49:57 db-app02 postgres[23554]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.10 sec
Oct 24 03:53:57 db-app02 postgres[23800]: [5-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:53:57 db-app02 postgres[23800]: [5-2] #011pages: 0 removed, 206 remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-3] #011tuples: 220 removed, 608 remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-4] #011system usage: CPU 0.01s/0.00u sec elapsed 0.12 sec
Oct 24 03:57:57 db-app02 postgres[24059]: [4-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:57:57 db-app02 postgres[24059]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.04 sec
Oct 24 04:01:57 db-app02 postgres[24394]: [4-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:01:57 db-app02 postgres[24394]: [4-2] #011pages: 0 removed, 206 remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-3] #011tuples: 220 removed, 608 remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-4] #011system usage: CPU 0.00s/0.01u sec elapsed 0.04 sec
Oct 24 04:05:57 db-app02 postgres[24640]: [2-1] user=nutricate,db=admin_production ERROR: missing chunk number 0 for toast value 2411466 in pg_toast_2619
Oct 24 04:05:57 db-app02 postgres[24639]: [6-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:05:57 db-app02 postgres[24639]: [6-2] #011pages: 0 removed, 206 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-3] #011tuples: 216 removed, 608 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.08 sec
Oct 24 04:09:57 db-app02 postgres[24877]: [5-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:09:57 db-app02 postgres[24877]: [5-2] #011pages: 0 removed, 206 remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-3] #011tuples: 220 removed, 608 remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.10 sec
Oct 24 04:13:57 db-app02 postgres[25116]: [6-1] user=,db= LOG: automatic vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:13:57 db-app02 postgres[25116]: [6-2] #011pages: 0 removed, 206 remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-3] #011tuples: 220 removed, 608 remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-4] #011system usage: CPU 0.00s/0.00u sec elapsed 0.12 sec
Andrew Hammond <andrew.george.hammond@gmail.com> writes: > On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm, that's pretty interesting ... are there any nearby autovacuums of >> pg_toast_2619? > Several, both before and after the error message: Well, it seems clear that somehow the vacuum deleted a toast tuple that the other statement was about to fetch, but it's not clear how this could be. The pg_statistic fetch must have come from the planner, which should always be called with a transaction snapshot established, and that ought to protect it against vacuum deleting anything that could be visible to SnapshotNow. Weird. [ pokes around for a bit ... ] Hmm, can you say how the failing query was submitted, exactly? I'm wondering if it came in via simple Query (PQexec) or extended-query protocol (anything with parameters). regards, tom lane
On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Hammond <andrew.george.hammond@gmail.com> writes:
> On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:>> Hmm, that's pretty interesting ... are there any nearby autovacuums ofWell, it seems clear that somehow the vacuum deleted a toast tuple that
>> pg_toast_2619?
> Several, both before and after the error message:
the other statement was about to fetch, but it's not clear how this
could be. The pg_statistic fetch must have come from the planner, which
should always be called with a transaction snapshot established, and
that ought to protect it against vacuum deleting anything that could be
visible to SnapshotNow. Weird.
[ pokes around for a bit ... ] Hmm, can you say how the failing query
was submitted, exactly? I'm wondering if it came in via simple Query
(PQexec) or extended-query protocol (anything with parameters).
The command was sent from a python via django-1.2.1 using psycopg2-2.2.1
Andrew
On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond <andrew.george.hammond@gmail.com> wrote: > On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ pokes around for a bit ... ] Hmm, can you say how the failing query >> was submitted, exactly? I'm wondering if it came in via simple Query >> (PQexec) or extended-query protocol (anything with parameters). > > The command was sent from a python via django-1.2.1 using psycopg2-2.2.1 PQexec then. -- Daniele
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes: > On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond > <andrew.george.hammond@gmail.com> wrote: >> On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ pokes around for a bit ... ] �Hmm, can you say how the failing query >>> was submitted, exactly? �I'm wondering if it came in via simple Query >>> (PQexec) or extended-query protocol (anything with parameters). >> The command was sent from a python via django-1.2.1 using psycopg2-2.2.1 > PQexec then. Thanks for the followup. My current theory is that that's unrelated, though --- it now seems that this is just a race condition in the use of toasted pg_statistic entries. See http://archives.postgresql.org/pgsql-hackers/2011-10/msg01366.php regards, tom lane
Andrew Hammond <andrew.george.hammond@gmail.com> writes: > On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x, >> and are you by any chance in the habit of running CLUSTER or VACUUM FULL >> on your system catalogs? Could one have been running in the background >> when this happened? > This is 9.0.4. I've applied a patch for this: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f797d27fe0be5200ad5fd5af6cefcee30c8e24a It turns out that the problem is very significantly harder to reproduce pre-9.0, which may explain why you hadn't seen it before ... but it does exist, at least as far back as 8.2, and probably in every version that has TOAST tables. regards, tom lane