Query, usually running <300ms, sometimes hangs for hours

Поиск
Список
Период
Сортировка
От Markus Wollny
Тема Query, usually running <300ms, sometimes hangs for hours
Дата
Msg-id 28011CD60FB1724DBA4442E38277F62613E2A056@hermes.computec.de
обсуждение исходный текст
Ответы Re: Query, usually running <300ms, sometimes hangs for hours  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hi!

We're currently still on PostgreSQL 8.3.7 and are experiencing a strange
problem since a couple of days. I have a suspicion on what is causing it
(probably not PostgreSQL) and I'd like to hear your opinion before
taking my findings to the Railo bugtracker.

We're running queries like this every couple of minutes as scheduled
task in Railo 3.3.0.022 rc running on Tomcat 6.0.18; connection is made
via JDBC:

Select distinct a.article_id, art.article_type_id
, CASE when max(lc.count) is null Then 0 else max(lc.count) END as likes
, CASE WHEN count_reply is null THEN 0 ELSE count_reply END as
count_reply
from babel_pcgames.article art inner join babel_pcgames.article_category
a on a.article_id = art.article_id
left join modules.likebutton_counter lc on (lc.uid = a.entity_id and
lc.site_id = $1 and lc.type_id = 2)
left join modules.article_comments ac on (ac.article_id = a.article_id
and ac.board_id in (13))
where a.article_id in ($2,[...],$2715)
Group by a.article_id, ac.count_reply, art.article_type_id
Order by a.article_id

Strange thing now that's happening occasionally since last Friday is,
that one or more of these queries is hanging for several hours without
completing. When I actually execute it in parallel to the already
running query, it's coming back fine after about 300ms or less. When I
try to kill -TERM [pid] on this backend, nothing happens. I actually
have to kill -9 [pid] to make it go away which of course causes the
Postmaster to go into recovery mode and thus leads to a short downtime
of the server.

I know, ~3,000 elements in the IN clause are quite a lot, but we haven't
seen any problems with this before and I don't think that this is
actually causing it - this same type of query has been running unchanged
for more than six months now. I think that the "kill [pid]" not having
any effect is quite suspicious. So I tried a backtrace with gdb; I
haven't got full debugging support in my installation, but I get the
functions nevertheless, which is better than nothing, I suppose:

(gdb) bt
#0 0x00007f58e3394505 in send () from /lib/libc.so.6
#1 0x0000000000546249 in internal_flush ()
#2 0x000000000054635d in internal_putbytes ()
#3 0x00000000005463bc in pq_putmessage ()
#4 0x00000000005479c4 in pq_endmessage ()
#5 0x0000000000452eb0 in printtup ()
#6 0x0000000000526408 in ExecutorRun ()
#7 0x00000000005bfe50 in PortalRunSelect ()
#8 0x00000000005c14b9 in PortalRun ()
#9 0x00000000005be025 in PostgresMain ()
#10 0x0000000000591ed2 in ServerLoop ()
#11 0x0000000000592bb4 in PostmasterMain ()
#12 0x0000000000548268 in main ()

I haven't got much experience reading stack traces, but I suspect that
what this means is that PostgreSQL actually has done its job with the
query and is now hanging in sending the result back to the client. This
would indicate some sort of evil client behavior, i.e. connection still
open but nobody answering on the other end. I expect that this might in
fact explain why a kill -TERM on the backend doesn't have any effect as
control over this process is currently being handled by the network
stack kernel-side and there's some sort of network interrupt wait in
effect here.

I deployed the Railo patch from 3.3.0.018.rc to 3.3.0.022.rc on Friday,
July 22nd, so the coincidence of this issue happening for the first time
and the Railo patch in effect is another indication that the actual
cause of the problem is client-side.

Could anybody shed some more light on what I'm seeing here? I'm
currently trying to run the very same job on another CFML-type server
and haven't had a single issue for a couple of hours now, but as this
has been hitting us only every couple of hours at best I'm still not
completely sure that my assumptions have been correct.

Kind regards

  Markus


BAM! Der COMPUTEC Games Award 2011 - Jetzt abstimmen fur die besten Games: www.bamaward.de

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



В списке pgsql-general по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Implementing "thick"/"fat" databases
Следующее
От: Frank Lanitz
Дата:
Сообщение: Re: Implementing "thick"/"fat" databases