Обсуждение: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

Поиск
Список
Период
Сортировка

Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Craig Vosburgh
Дата:
All,

I'm hoping for some help on trying to figure out what is going on with
our postgres implementation.  We have an application that uses Hibernate
to persist objects into a Postgres database.  We've run into a
semi-repeatable problem (every other or every third test run) where we
issue an update SQL command to update a single row in a table and that
update just hangs (we've let it go for more than 12hrs to see if it
would eventually complete and it did not)  While the update is hung the
system has a negligible CPU load (with postgres not even showing up in a
top listing)

We've dumped the locks and it shows that all locks have been granted so
it appears that it is not a lock that is standing in our way.  We've
also gone in via psql while the update is hung and were able to perform
an update on the offending table without issue.  Finally, we have also
enabled the statement_timeout and set it to five minutes and it does
timeout the hung update and return to normal processing by rolling back
the offending transaction but that's not a viable solution for us.

Anyone have any words of wisdom on how to track this down?
Thanks,
-Craig

Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Tom Lane
Дата:
Craig Vosburgh <craig.vosburgh@cassatt.com> writes:
> We've dumped the locks and it shows that all locks have been granted so
> it appears that it is not a lock that is standing in our way.  We've
> also gone in via psql while the update is hung and were able to perform
> an update on the offending table without issue.  Finally, we have also
> enabled the statement_timeout and set it to five minutes and it does
> timeout the hung update and return to normal processing by rolling back
> the offending transaction but that's not a viable solution for us.

> Anyone have any words of wisdom on how to track this down?

For starters, attach to the hung backend with gdb and get a stack trace ...

            regards, tom lane

Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Craig Vosburgh
Дата:
>> Craig Vosburgh <craig(dot)vosburgh(at)cassatt(dot)com> writes:
>> We've dumped the locks and it shows that all locks have been granted so
>> it appears that it is not a lock that is standing in our way.  We've
>> also gone in via psql while the update is hung and were able to perform
>> an update on the offending table without issue.  Finally, we have also
>> enabled the statement_timeout and set it to five minutes and it does
>> timeout the hung update and return to normal processing by rolling back
>> the offending transaction but that's not a viable solution for us.
>>
>> Anyone have any words of wisdom on how to track this down?
>
> For starters, attach to the hung backend with gdb and get a stack trace ...
>
>            regards, tom lane

Tom,

Got GDB installed on the errant node and did a back trace call (I'm guessing
that is what you were looking for when you said stack trace) on the process
that shows in the process table as executing the hung SQL command.

The backtrace is:
(gdb) bt
#0  0x0088b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0096f3ab in semop () from /lib/tls/libc.so.6
#2  0x081ba8f8 in PGSemaphoreLock ()
#3  0x081e4d9e in ProcSleep ()
#4  0x081e1db3 in GrantAwaitedLock ()
#5  0x0832f984 in twophase_recover_callbacks ()
#6  0x00000006 in ?? ()
#7  0x00000003 in ?? ()
#8  0x091c3770 in ?? ()
#9  0x091c3758 in ?? ()
#10 0xbfe63158 in ?? ()
#11 0xbfe634a0 in ?? ()
#12 0xbfe634b0 in ?? ()
#13 0x0832f984 in twophase_recover_callbacks ()
#14 0x08e8dc40 in ?? ()
#15 0x00000006 in ?? ()
#16 0xbff32c72 in ?? ()
#17 0x08e68758 in ?? ()
#18 0xb792a9f8 in ?? ()
#19 0xbfe631b8 in ?? ()
#20 0xbfe630d0 in ?? ()
#21 0x081e1d88 in GrantAwaitedLock ()
#22 0x081d6722 in MarkBufferDirty ()
(gdb)

Thanks for the help,
-Craig


Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Tom Lane
Дата:
Craig Vosburgh <craig.vosburgh@cassatt.com> writes:
> Got GDB installed on the errant node and did a back trace call (I'm guessing
> that is what you were looking for when you said stack trace) on the process
> that shows in the process table as executing the hung SQL command.

> The backtrace is:
> (gdb) bt
> #0  0x0088b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1  0x0096f3ab in semop () from /lib/tls/libc.so.6
> #2  0x081ba8f8 in PGSemaphoreLock ()
> #3  0x081e4d9e in ProcSleep ()
> #4  0x081e1db3 in GrantAwaitedLock ()
> #5  0x0832f984 in twophase_recover_callbacks ()
> #6  0x00000006 in ?? ()

Hmm.  gdb is lying to you, because GrantAwaitedLock doesn't call
ProcSleep --- probably line 4 should refer to WaitOnLock instead.
You could try installing a non-symbol-stripped postgres executable
(or in RPM environments, install the matching debuginfo RPM) to
get a more trustworthy backtrace.

However, what I suspect you are looking at is just a run-of-the-mill
lock wait.  You *sure* there's no ungranted locks showing in pg_locks?

            regards, tom lane

Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Craig Vosburgh
Дата:
Tom,

Yea, about that lock that I *said* I didn't have...  We had been joining
through the pg_class and pg_tables tables to get some additional data and it
turns out the row in the pg_locks that shows as locked doesn't have a
relation so it was filtered out due to the join.

So, now that I have accurate data (just doing a select * from pg_locks where
granted = 'f') I indeed have an outstanding lock that isn't granted that is
causing the holdup.  However, now I have a couple of other questions.

First, I thought I've seen in the past where Postgres will recognize a
deadlock and will log an error and then move on.  Do I need to do something
with the more recent code base to get that behavior?  I think the last time
I saw that was back on an 8.1.X code base.

Second, any words of wisdom to help run to ground who's keeping me from
getting the lock on the offending row?  The row in the lock table that shows
granted false does not show as belonging to a database or relation (both
null) so I can't join through to get the table info from pg_table.

Thanks for all the help,
-Craig

On 5/12/08 12:16 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Craig Vosburgh <craig.vosburgh@cassatt.com> writes:
>> Got GDB installed on the errant node and did a back trace call (I'm guessing
>> that is what you were looking for when you said stack trace) on the process
>> that shows in the process table as executing the hung SQL command.
>
>> The backtrace is:
>> (gdb) bt
>> #0  0x0088b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>> #1  0x0096f3ab in semop () from /lib/tls/libc.so.6
>> #2  0x081ba8f8 in PGSemaphoreLock ()
>> #3  0x081e4d9e in ProcSleep ()
>> #4  0x081e1db3 in GrantAwaitedLock ()
>> #5  0x0832f984 in twophase_recover_callbacks ()
>> #6  0x00000006 in ?? ()
>
> Hmm.  gdb is lying to you, because GrantAwaitedLock doesn't call
> ProcSleep --- probably line 4 should refer to WaitOnLock instead.
> You could try installing a non-symbol-stripped postgres executable
> (or in RPM environments, install the matching debuginfo RPM) to
> get a more trustworthy backtrace.
>
> However, what I suspect you are looking at is just a run-of-the-mill
> lock wait.  You *sure* there's no ungranted locks showing in pg_locks?
>
> regards, tom lane


Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Tom Lane
Дата:
Craig Vosburgh <craig.vosburgh@cassatt.com> writes:
> Second, any words of wisdom to help run to ground who's keeping me from
> getting the lock on the offending row?  The row in the lock table that shows
> granted false does not show as belonging to a database or relation (both
> null) so I can't join through to get the table info from pg_table.

What's the locktype?  If (as I suspect) it's a transaction or
virtualtransaction lock, then which process holds that lock and what's
it doing?  For that matter, what is the stuck process doing?
pg_stat_activity output should be sufficient here.

            regards, tom lane

Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Craig Vosburgh
Дата:
>
> What's the locktype?
>

Yep, locktype is transaction.

> If (as I suspect) it's a transaction or
> virtualtransaction lock, then which process holds that lock and what's
> it doing?

As for which process owns that lock, I'm not sure how to find that out
(sorry newbie).  I can find the PID that is waiting for that lock and I can
find the table/row that appears to be waiting for the lock to perform the
action but I can't figure out which process actually owns the lock that is
causing the issue.

> For that matter, what is the stuck process doing?
> pg_stat_activity output should be sufficient here.
>
> regards, tom lane

I've enabled stats tracking and the process that is hung has the following
data in the pg_stat_activity table

 datid | datname | procpid | usesysid | usename  |
current_query
| waiting |          xact_start           |          query_start          |
backend_start         | client_addr | client_port
-------+---------+---------+----------+----------+--------------------------
----------------------------------------------------------------------------
-------------------------------------------------+---------+----------------
---------------+-------------------------------+----------------------------
---+-------------+-------------
 16384 | collage |   20938 |       10 | postgres | update ips set
address=$1, usage=$2, subnet_rips_id=$3, hostname=$4, errored=$5,
errorReason=$6, modinfo=$7, name=$8, description=$9 where bmo_id=$10 | t
| 2008-05-11 17:25:04.484224-06 | 2008-05-11 17:25:04.528319-06 | 2008-05-11
17:02:00.016083-06 | 127.0.0.1   |       49056

Everything else in the table is either IDLE or IDLE in transaction.

Thanks again,
-Craig

On 5/12/08 1:11 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:



Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Tom Lane
Дата:
Craig Vosburgh <craig.vosburgh@cassatt.com> writes:
> As for which process owns that lock, I'm not sure how to find that out
> (sorry newbie).  I can find the PID that is waiting for that lock and I can
> find the table/row that appears to be waiting for the lock to perform the
> action but I can't figure out which process actually owns the lock that is
> causing the issue.

It's whichever one actually holds that same lock according to pg_locks
(ie, there's an entry that matches on locktype and transactionid but
has granted = 't'; then join the pid of that entry against
pg_stat_activity).

            regards, tom lane

Re: Hung SQL Update Linux Redhat 4U5 Postgres 8.3.1

От
Craig Vosburgh
Дата:
>>Craig Vosburgh <craig(dot)vosburgh(at)cassatt(dot)com> writes:
>> We've dumped the locks and it shows that all locks have been granted so
>> it appears that it is not a lock that is standing in our way.  We've
>> also gone in via psql while the update is hung and were able to perform
>> an update on the offending table without issue.  Finally, we have also
>> enabled the statement_timeout and set it to five minutes and it does
>> timeout the hung update and return to normal processing by rolling back
>> the offending transaction but that's not a viable solution for us.
>>
>> Anyone have any words of wisdom on how to track this down?
>
>For starters, attach to the hung backend with gdb and get a stack trace ...
>
>            regards, tom lane

Tom,

Got GDB installed on the errant node and did a back trace call (I'm guessing
that is what you were looking for when you said stack trace) on the process
that shows in the process table as executing the hung SQL command.

The backtrace is:
(gdb) bt
#0  0x0088b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0096f3ab in semop () from /lib/tls/libc.so.6
#2  0x081ba8f8 in PGSemaphoreLock ()
#3  0x081e4d9e in ProcSleep ()
#4  0x081e1db3 in GrantAwaitedLock ()
#5  0x0832f984 in twophase_recover_callbacks ()
#6  0x00000006 in ?? ()
#7  0x00000003 in ?? ()
#8  0x091c3770 in ?? ()
#9  0x091c3758 in ?? ()
#10 0xbfe63158 in ?? ()
#11 0xbfe634a0 in ?? ()
#12 0xbfe634b0 in ?? ()
#13 0x0832f984 in twophase_recover_callbacks ()
#14 0x08e8dc40 in ?? ()
#15 0x00000006 in ?? ()
#16 0xbff32c72 in ?? ()
#17 0x08e68758 in ?? ()
#18 0xb792a9f8 in ?? ()
#19 0xbfe631b8 in ?? ()
#20 0xbfe630d0 in ?? ()
#21 0x081e1d88 in GrantAwaitedLock ()
#22 0x081d6722 in MarkBufferDirty ()
(gdb)

Thanks for the help,
-Craig