Обсуждение: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Uh oh. This is a first (for me). I got this error on a transaction, it did not crash the server but did abort the transaction (of course). ERROR: lock AccessShareLock on object 16385/16467/0 is already held What was I doing? A large transaction where I was pushing about 20 million rows into two tables using prepared transactions. I have been doing this for some time (over a year) on 8.2.6 without any problems. I may be able to reproduce, but the transaction takes a long time to complete... about a day. I'm running it again now. There was no information in the log. What else can I do to get debug information for you, provided it's reproducable... Some db details: db=> select version(); version ------------------------------------------------------------------------------------------------- PostgreSQL 8.3.3 on amd64-portbld-freebsd6.3, compiled by GCC cc (GCC) 3.4.6 [FreeBSD] 20060305 (1 row) This instance was also compiled with the following changes (pg_config_manual.h): diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h index 354a3d3..20c40c4 100644 --- a/src/include/pg_config_manual.h +++ b/src/include/pg_config_manual.h @@ -23,7 +23,7 @@ * * Changing BLCKSZ requires an initdb. */ -#define BLCKSZ 8192 +#define BLCKSZ 32768 /* * RELSEG_SIZE is the maximum number of blocks allowed in one disk @@ -69,7 +69,7 @@ * * Changing this requires an initdb. */ -#define NAMEDATALEN 64 +#define NAMEDATALEN 128 /* * Maximum number of arguments to a function. This goes along with a UFS block size of 32768. Regards, Mike -- Michael Milligan -> milli@acmeps.com
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Heikki Linnakangas
Дата:
Michael Milligan wrote: > Uh oh. This is a first (for me). I got this error on a transaction, it > did not crash the server but did abort the transaction (of course). > > ERROR: lock AccessShareLock on object 16385/16467/0 is already held > > What was I doing? A large transaction where I was pushing about 20 > million rows into two tables using prepared transactions. I have been > doing this for some time (over a year) on 8.2.6 without any problems. Could you provide more details on the transaction? At what point did it fail, during the insertss, or at commit perhaps? Are you really using prepared transactions, aka two-phase commit, or did you mean prepared statements? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Michael Milligan <milli@acmeps.com> writes: > Uh oh. This is a first (for me). I got this error on a transaction, it > did not crash the server but did abort the transaction (of course). > ERROR: lock AccessShareLock on object 16385/16467/0 is already held Huh, that shouldn't happen. What object is that? The 16385 should be a database OID, and the 16467 is most likely a table's OID within that database. > What was I doing? A large transaction where I was pushing about 20 > million rows into two tables using prepared transactions. I have been > doing this for some time (over a year) on 8.2.6 without any problems. > I may be able to reproduce, but the transaction takes a long time to > complete... about a day. I'm running it again now. I wonder if it's possible that you overflowed nLocks in the local lock table. There are a lot of other things that likely would break first, though, so that doesn't seem very probable. Is this transaction longer than any comparable one you ever did before? regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Heikki Linnakangas wrote: > Michael Milligan wrote: >> Uh oh. This is a first (for me). I got this error on a transaction, it >> did not crash the server but did abort the transaction (of course). >> >> ERROR: lock AccessShareLock on object 16385/16467/0 is already held >> >> What was I doing? A large transaction where I was pushing about 20 >> million rows into two tables using prepared transactions. I have been >> doing this for some time (over a year) on 8.2.6 without any problems. > > Could you provide more details on the transaction? At what point did it > fail, during the insertss, or at commit perhaps? Are you really using > prepared transactions, aka two-phase commit, or did you mean prepared > statements? Yes, mistyped there, prepared _statements_. It failed during an EXECUTE, so during a prepared INSERT statement. Regards, Mike -- Michael Milligan -> milli@acmeps.com
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Tom Lane wrote: > Michael Milligan <milli@acmeps.com> writes: >> Uh oh. This is a first (for me). I got this error on a transaction, it >> did not crash the server but did abort the transaction (of course). > >> ERROR: lock AccessShareLock on object 16385/16467/0 is already held > > Huh, that shouldn't happen. What object is that? The 16385 should be > a database OID, and the 16467 is most likely a table's OID within that > database. > >> What was I doing? A large transaction where I was pushing about 20 >> million rows into two tables using prepared transactions. I have been >> doing this for some time (over a year) on 8.2.6 without any problems. > >> I may be able to reproduce, but the transaction takes a long time to >> complete... about a day. I'm running it again now. > > I wonder if it's possible that you overflowed nLocks in the local lock > table. There are a lot of other things that likely would break first, > though, so that doesn't seem very probable. > > Is this transaction longer than any comparable one you ever did before? For 8.3.3, yes. This is only the second such large transaction I've done on 8.3.3, the first one I did was around 6.7 million rows and it worked. And a correction, the transaction that caused this error was inserting 13.5 million rows and failed near the end. Regards, Mike -- Michael Milligan -> milli@acmeps.com
Michael Milligan <milli@acmeps.com> writes: > Tom Lane wrote: >> Huh, that shouldn't happen. What object is that? The 16385 should be >> a database OID, and the 16467 is most likely a table's OID within that >> database. Please answer the above question. > And a correction, the transaction that caused this error was inserting > 13.5 million rows and failed near the end. Once you've determined which table the error message is talking about, please show us what the transaction does with that table. regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Tom Lane wrote: > Michael Milligan <milli@acmeps.com> writes: >> Tom Lane wrote: >>> Huh, that shouldn't happen. What object is that? The 16385 should be >>> a database OID, and the 16467 is most likely a table's OID within that >>> database. > > Please answer the above question. 16385 is the database (db) and 16467 is the table (email) the prepared inserts are executed against. That table (email) is an inherited table definition for partitioned tables that are named by date ranges. E.g., CREATE TABLE email_2008_week27 ( PRIMARY KEY (id), CHECK ( ts >= '2008-06-30'::timestamp AND ts < '2008-07-07'::timestamp ) ) INHERITS (email) > >> And a correction, the transaction that caused this error was inserting >> 13.5 million rows and failed near the end. > > Once you've determined which table the error message is talking about, > please show us what the transaction does with that table. You mean like: BEGIN; PREPARE msg (...) INSERT INTO email VALUES (...); EXECUTE msg (...) EXECUTE msg (...) EXECUTE msg (...) EXECUTE msg (...) ... repeated millions of times COMMIT; Regards, Mike -- Michael Milligan -> milli@acmeps.com
Michael Milligan <milli@acmeps.com> writes: > Tom Lane wrote: >> Once you've determined which table the error message is talking about, >> please show us what the transaction does with that table. > You mean like: > BEGIN; > PREPARE msg (...) INSERT INTO email VALUES (...); > EXECUTE msg (...) > EXECUTE msg (...) > EXECUTE msg (...) > EXECUTE msg (...) > ... repeated millions of times > COMMIT; What I'm wondering about is the sequence of operations that are executed per row. Could it be long enough that the email table is being touched by more than 2 billion separate SQL operations within the transaction? regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Tom Lane wrote: > Michael Milligan <milli@acmeps.com> writes: >> Tom Lane wrote: >>> Once you've determined which table the error message is talking about, >>> please show us what the transaction does with that table. > >> You mean like: > >> BEGIN; >> PREPARE msg (...) INSERT INTO email VALUES (...); >> EXECUTE msg (...) >> EXECUTE msg (...) >> EXECUTE msg (...) >> EXECUTE msg (...) >> ... repeated millions of times >> COMMIT; > > What I'm wondering about is the sequence of operations that are executed > per row. Could it be long enough that the email table is being touched > by more than 2 billion separate SQL operations within the transaction? I can't imagine that, but then again, I'm not a PostgreSQL internals expert either. The partitioned email tables have indexes on them, but in this part of the transaction, all I'm doing is EXECUTEs where the prepared statement is INSERTing a single row. Nothing else fancy with the email table is going on. There are other things going on with values being stored to TEMP tables within the same transaction, would those count toward the 2 billion? I ran the transaction again and it failed again, so it is reproducable. I'm going to try it with a much smaller dataset and see what happens. Regards, Mike -- Michael Milligan -> milli@acmeps.com
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Tom Lane wrote: > > What I'm wondering about is the sequence of operations that are executed > per row. Could it be long enough that the email table is being touched > by more than 2 billion separate SQL operations within the transaction? FWIW, I've used the exact same code against PG 8.2.6 and have half a dozen similar transactions that inserted more than 13.5 million rows, with the largest transaction at a little over 25 million rows inserted into the email table. Regards, Mike -- Michael Milligan -> milli@acmeps.com
Michael Milligan <milli@acmeps.com> writes: > FWIW, I've used the exact same code against PG 8.2.6 and have half a > dozen similar transactions that inserted more than 13.5 million rows, > with the largest transaction at a little over 25 million rows inserted > into the email table. Hmph. That seems to eliminate the overflow theory, because 8.2 has essentially the same lock-counting code as 8.3. Unless 8.3 is taking out the lock a heckuva lot more than 8.2 did, but I can't think of a reason for that to happen. Now that we know you can reproduce it, we should think about how to get some information out. Are you in a position to build a locally modified Postgres? I could send you a patch to make that particular error report dump out more information about the lock state, but a patch won't do you any good if you aren't able to build from source. regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Michael Milligan
Дата:
Tom Lane wrote: > Michael Milligan <milli@acmeps.com> writes: >> FWIW, I've used the exact same code against PG 8.2.6 and have half a >> dozen similar transactions that inserted more than 13.5 million rows, >> with the largest transaction at a little over 25 million rows inserted >> into the email table. > > Hmph. That seems to eliminate the overflow theory, because 8.2 has > essentially the same lock-counting code as 8.3. Unless 8.3 is taking > out the lock a heckuva lot more than 8.2 did, but I can't think of a > reason for that to happen. > > Now that we know you can reproduce it, we should think about how to get > some information out. Are you in a position to build a locally modified > Postgres? I could send you a patch to make that particular error report > dump out more information about the lock state, but a patch won't do you > any good if you aren't able to build from source. I can rebuild with a patch, sure. I've got a spare machine I can mirror the db over to (same hardware) to debug this. Regards, Mike -- Michael Milligan -> milli@acmeps.com
[ reincluding the mailing list ] Michael Milligan <milli@acmeps.com> writes: > Okay, it reproduces and surprise surprise nLocks does overflow... > ERROR: lock AccessShareLock on object 16385/16467/0 is already held > lock(0x87408a028) id(16385,16467,0,0,0,1) grantMask(a) waitMask(0) > req(2,0,1,0,0,0,0,0)=3 grant(1,0,1,0,0,0,0,0)=2 wait(0) > proclock(0x8743a7508) lock(0x87408a028) method(1) proc(0x8743aada8) hold(a) > locallock(0xb29c78) nLocks(-2147483648) nOwners(2) mOwners(8) Hah. Okay, that shows that we'd never have reproduced it with a small test case. > So I'm guessing this is not a bug? Just that for some reason 8.3.3 is > grabbing more locks than 8.2.6 does and I have to adjust my batch > processes to break things up into chunks of less than 10 million per > transaction... :-/ Well, I'm wondering exactly why 8.3 is taking more locks than 8.2 does, because AFAICS it shouldn't. Could you extract a complete example of what your code does per tuple? I would like to run a small test case and just see where the LockAcquire calls come from in each version. (Or, if you prefer, you can get out gdb and do the legwork yourself...) It's true that breaking up the transaction is likely to be the best short-term answer for you, but I think there might be a bug here. In any case, now that we know that nLocks overflow is actually possible within real-world transaction lengths, it'd behoove us to do something about that in 8.4 or beyond. regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Bruce Momjian
Дата:
Tom Lane wrote: > [ reincluding the mailing list ] > > Michael Milligan <milli@acmeps.com> writes: > > Okay, it reproduces and surprise surprise nLocks does overflow... > > > ERROR: lock AccessShareLock on object 16385/16467/0 is already held > > lock(0x87408a028) id(16385,16467,0,0,0,1) grantMask(a) waitMask(0) > > req(2,0,1,0,0,0,0,0)=3 grant(1,0,1,0,0,0,0,0)=2 wait(0) > > proclock(0x8743a7508) lock(0x87408a028) method(1) proc(0x8743aada8) hold(a) > > locallock(0xb29c78) nLocks(-2147483648) nOwners(2) mOwners(8) > > Hah. Okay, that shows that we'd never have reproduced it with a small > test case. > > > So I'm guessing this is not a bug? Just that for some reason 8.3.3 is > > grabbing more locks than 8.2.6 does and I have to adjust my batch > > processes to break things up into chunks of less than 10 million per > > transaction... :-/ > > Well, I'm wondering exactly why 8.3 is taking more locks than 8.2 does, > because AFAICS it shouldn't. Could you extract a complete example of > what your code does per tuple? I would like to run a small test case > and just see where the LockAcquire calls come from in each version. > (Or, if you prefer, you can get out gdb and do the legwork yourself...) > It's true that breaking up the transaction is likely to be the best > short-term answer for you, but I think there might be a bug here. > > In any case, now that we know that nLocks overflow is actually possible > within real-world transaction lengths, it'd behoove us to do something > about that in 8.4 or beyond. Is this a TODO? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> In any case, now that we know that nLocks overflow is actually possible >> within real-world transaction lengths, it'd behoove us to do something >> about that in 8.4 or beyond. > Is this a TODO? Yes, although I'm still waiting for more info from Michael. regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Alvaro Herrera
Дата:
Tom Lane wrote: > [ reincluding the mailing list ] > > Michael Milligan <milli@acmeps.com> writes: > > Okay, it reproduces and surprise surprise nLocks does overflow... > > > ERROR: lock AccessShareLock on object 16385/16467/0 is already held > > lock(0x87408a028) id(16385,16467,0,0,0,1) grantMask(a) waitMask(0) > > req(2,0,1,0,0,0,0,0)=3 grant(1,0,1,0,0,0,0,0)=2 wait(0) > > proclock(0x8743a7508) lock(0x87408a028) method(1) proc(0x8743aada8) hold(a) > > locallock(0xb29c78) nLocks(-2147483648) nOwners(2) mOwners(8) > > Hah. Okay, that shows that we'd never have reproduced it with a small > test case. This hasn't been fixed yet, has it? Do you have any ideas on how to actually fix the problem? I wonder if it looks like enlarging nLocks, or you're intending to attempt to reduce the number of locks taken. A customer of ours started hitting this bug too, last week. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Michael Milligan <milli@acmeps.com> writes: >>> Okay, it reproduces and surprise surprise nLocks does overflow... >> >> Hah. Okay, that shows that we'd never have reproduced it with a small >> test case. > This hasn't been fixed yet, has it? Well, it has been addressed anyway... 2008-09-15 21:56 tgl * src/include/storage/: lock.h (REL8_1_STABLE), lock.h (REL8_3_STABLE), lock.h (REL8_0_STABLE), lock.h (REL8_2_STABLE), lock.h: Widen the nLocks counts in local lock tables from int to int64. This forestalls potential overflow when the same table (or other object, but usually tables) is accessed by very many successive queries within a single transaction. Per report from Michael Milligan. Back-patch to 8.0, which is as far back as the patch conveniently applies. There have been no reports of overflow in pre-8.3 releases, but clearly the risk existed all along. (Michael's report suggests that 8.3 may consume lock counts faster than prior releases, but with no test case to look at it's hard to be sure about that. Widening the counts seems a good future-proofing measure in any event.) > A customer of ours started hitting this bug too, last week. Tut tut, customer not on latest point release? regards, tom lane
Re: PG 8.3.3 - ERROR: lock AccessShareLock on object 16385/16467/0 is already held
От
Alvaro Herrera
Дата:
Tom Lane wrote: > 2008-09-15 21:56 tgl > > * src/include/storage/: lock.h (REL8_1_STABLE), lock.h > (REL8_3_STABLE), lock.h (REL8_0_STABLE), lock.h (REL8_2_STABLE), > lock.h: Widen the nLocks counts in local lock tables from int to > int64. Doh, I didn't see it because I only checked the change log for lock.c, not lock.h :-( > > A customer of ours started hitting this bug too, last week. > > Tut tut, customer not on latest point release? Wouldn't be a first for them (we try hard, they don't necessarily heed advice). Thanks! -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.