Обсуждение: Immortal backend .... (kill -9 ?..)

Поиск
Список
Период
Сортировка

Immortal backend .... (kill -9 ?..)

От
"Rajesh Kumar Mallah." (by way of Rajesh Kumar Mallah.
Дата:
Hi

i have just discovered  a query in my database system that would take
extremely long durations to finish. the query plan is like:

explain SELECT source_id , cnt from (SELECT  source_id,count(source_id) as
 cnt from email_source group by source_id ) subsel join sources
 using(source_id) order by source_id
tradein_clients-# ;
NOTICE:  QUERY PLAN:
Merge Join  (cost=145610.94..147300.41 rows=112596 width=16)
  ->  Sort  (cost=145597.57..145597.57 rows=112596 width=4)
        ->  Subquery Scan subsel  (cost=130520.56..136150.35 rows=112596
 width=4) ->  Aggregate  (cost=130520.56..136150.35 rows=112596 width=4) ->
 Group  (cost=130520.56..133335.46 rows=1125957 width=4) ->  Sort
 (cost=130520.56..130520.56 rows=1125957 width=4) ->  Seq Scan on
 email_source  (cost=0.00..17346.57 rows=1125957 width=4) ->  Sort
 (cost=13.37..13.37 rows=213 width=4)
        ->  Seq Scan on sources  (cost=0.00..5.13 rows=213 width=4)
EXPLAIN

unfortunately it has been running since  yesterday and i am
discovering it now.

now even kill -9 <pid of backend> does not releases it below is
the current status of backends:

postgres  8594  0.0  0.0 139416 148 ?        S    May11   0:25
 /usr/local/pgsql/bin/postmaster postgres  8595  1.3  0.1 140408 1040 ?
 S    May11  78:47 postgres: stats buffer process postgres  8597  1.1  0.0
 139704 324 ?        S    May11  65:18 postgres: stats collector process
 postgres 32444  0.0  0.0 238700   0 ?        DW   May14   0:20 postgres:
 tradein tradein_clients 130.94.20.27 SELECT postgres 32482  0.0  0.0 139808
  0 ?        DW   May14   0:00 postgres: checkpoint subprocess postgres 16907
  0.0  0.3 140296 3912 ?       S    17:51   0:00 postgres: postgres
 tradein_clients 203.196.129.235 idle postgres 20382  1.3  4.1 140816 42588 ?
      S    18:27   0:01 postgres: tradein tradein_clients 130.94.20.27 idle
 in transaction postgres 20388  0.5  6.7 141200 68940 ?      S    18:27
 0:00 postgres: tradein tradein_clients 130.94.20.27 idle in transaction

The pid of the backend that is executing the query is 32444 .
also note that pid 32482 is also (running?) is since may 14
 and 32444 also  is (running?) since may 14.
(is there any relation between them?)
what is checkpoint subprocess?

the load average is
[root@linux10320 root2]# uptime
  6:30pm  up 4 days,  6:26,  1 user,  load average: 11.08, 10.54, 10.78
[root@linux10320 root2]#

can anyone tell me what might me causing so much load in the system,
or someway to get see what is going on to make such a load?


also if i were to bring down postgresql what should have been
a safe (and effective) way.

Disk I/O is like:

# iostat -d 2 6
Linux 2.4.9 (linux10320)        05/15/02

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0           32.25       450.01       445.92  166025844  164517760
dev8-1           34.26       434.75       640.21  160395794  236199552

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0            2.50        24.00        12.00         48         24
dev8-1            1.00         0.00        12.00          0         24

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0           13.00         0.00       112.00          0        224
dev8-1           13.00         0.00       112.00          0        224

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0           23.50         0.00       200.00          0        400
dev8-1           23.50         0.00       200.00          0        400

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0            3.00         0.00        44.00          0         88
dev8-1            3.00         0.00        44.00          0         88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev8-0            3.00         0.00        36.00          0         72
dev8-1            3.00         0.00        36.00          0         72
[root@linux10320 root2]#

Regds
Mallah.









--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.


Re: Immortal backend .... (kill -9 ?..)

От
Tom Lane
Дата:
"Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes:
> now even kill -9 <pid of backend> does not releases it below is
> the current status of backends:

I think there is something broken about your system.  It is not normal
to be in a state where you have unkillable processes.  I'm wondering
about disk drive problems leading to I/O requests that never complete,
myself...

            regards, tom lane

Re: Immortal backend .... (kill -9 ?..)

От
"Rajesh Kumar Mallah."
Дата:
On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote:
> "Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah.
 <mallah@trade-india.com>) writes:
> > now even kill -9 <pid of backend> does not releases it below is
> > the current status of backends:
>

I have checked by diskdrive there are no badblocks at least
in the drive. It is 34+34 GB SCSI IN raid 1 configuration.

But does "iostat" hints too much disk i/o at the moment?

is there anyway i can bring the load avergage to normal < 1
with rebooting and fscking...

can i give you more information ?


regds
mallah.







the system load has be > 10 for past 1 hrs at least

> I think there is something broken about your system.  It is not normal
> to be in a state where you have unkillable processes.  I'm wondering
> about disk drive problems leading to I/O requests that never complete,
> myself...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.



Re: Immortal backend .... (kill -9 ?..)

От
"Rajesh Kumar Mallah."
Дата:
Hi,

Two of the backends are showing status "D"  which "man ps"
describes as "uninterruptible sleep (usually IO)"

since kill -9 does not get me rid of these should i expect
postmaster which is still the parent process for those two
spoilt kids to terminate them?

regds
Mallah



On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote:
> "Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes:
> > now even kill -9 <pid of backend> does not releases it below is
> > the current status of backends:
>
> I think there is something broken about your system.  It is not normal
> to be in a state where you have unkillable processes.  I'm wondering
> about disk drive problems leading to I/O requests that never complete,
> myself...
>
>             regards, tom lane

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.



Re: Immortal backend .... (kill -9 ?..)

От
Jeremy Buchmann
Дата:
Rajesh Kumar Mallah. wrote:
> Hi,
>
> Two of the backends are showing status "D"  which "man ps"
> describes as "uninterruptible sleep (usually IO)"
>
> since kill -9 does not get me rid of these should i expect
> postmaster which is still the parent process for those two
> spoilt kids to terminate them?

That does sound like a hardware-related problem.  I saw the same thing
on a flaky Jaz drive.  The only thing I could do was reboot, which got
rid of the offending processes.  But that won't fix your hardware problem.

--Jeremy

> On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote:
>
>>"Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes:
>>
>>>now even kill -9 <pid of backend> does not releases it below is
>>>the current status of backends:
>>
>>I think there is something broken about your system.  It is not normal
>>to be in a state where you have unkillable processes.  I'm wondering
>>about disk drive problems leading to I/O requests that never complete,
>>myself...
>>
>>            regards, tom lane
>
>