Обсуждение: Postmaster won't -HUP
Hello out there,
I'm having a problem with a production server. Actually, there are two
problems. The semi-trivial problem is that Postgres won't die using the
service mechanism. As root, I "service postgres stop" and then "service
postgres start" after a reasonable wait. The restart will earn me a
"StreamServerPort: cannot bind to port" which indicates tht the process
never died. A ps ax confirms the persistance of postmaster. When I kill -9
the processes (postmaster, the /bin/sh -c postgres, and logger) they
process will claim to start with "service postgres start" but it reports no
PID and doesn't show up in ps ax. It is clearly not running at this point.
The real problem, which caused all this debugging, is that twice so far,
for no apparent reason, I have pegged the processors on the server. The
machine has two 500mHz processors with 256 MB ram. I have a hard time
believing that one small query can bring that machine to its knees, but it
has, twice. The queries were run through a hard coded php front end (for
testing purposes). Any insight on these two problems would be appreciated
greatly.
thanks,
Jerry Lynde
Jerry Lynde
IT - Invocation/Evocation, Banishing, et al.
"In E-commerce, the E stands for Evil."
Due Diligence Inc.
http://www.diligence.com
Phone: (406) 728-0001 x232
Fax: (406) 728-0006
In version 7.0 postgres waits for all clients to close their connections before exiting. Before it just quit. Jerry Lynde wrote: > > Hello out there, > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. > > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. > > thanks, > > Jerry Lynde > > Jerry Lynde > IT - Invocation/Evocation, Banishing, et al. > "In E-commerce, the E stands for Evil." > Due Diligence Inc. > http://www.diligence.com > Phone: (406) 728-0001 x232 > Fax: (406) 728-0006
Jerry Lynde wrote: > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. Hello Jerry, What version are you running? What does your server log show? Any other logs? Can you show us the small query, # of rows, etc? Regards, Ed Loehr
Jerry Lynde wrote: > > Hello out there, > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. Hmm, on debian it always dies, though it does takes a while sometimes. Also, try to avoid kill -9 because then it can't clean up shared memory, locks, temporary files, etc. Try kill -INT or -TERM. > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. Heh. Yes, I also wonder why "select * from bigtable" where bigtable is a 400,000 row table crashes the machine also :). (This is pg_dump btw). Maybe you should post the query. Actually, there is one other situation I've killed the machine nearly. I have two big tables, "bigtable" and "largetable". Then do this query: select * from bigtable where largetable.a = 1; (ofcourse the actual query was much longer but this one demonstrates the problem). It actually does a join between those two tables even though only one is mentioned in the from part. Running explain over this told that this would take a *very* long time to complete. This is probably not a problem in newer versions though. What version are you using? -- Martijn van Oosterhout <kleptog@cupid.suninternet.com> http://cupid.suninternet.com/~kleptog/
Wow! Thanks for all the replies!
The version I'm running is 7.0.0
It could certainly be that the connections are remaining open and the
server isn't dying due to that. I'll play with it a little when I get the
chance, probably this weekend (when clients aren't hitting the server.)
Thanks for the tip on -INT and -TERM.
As for the query I'm running, it was simply select * from bigtable (about
2-300k lines) where
firstname=<obscure fname> and
lastname=<obscure lname> and
DOB=<the exact DOB for the above named individual>
Essentially, the query was hard-coded to pull a specific record from the
data and disply it on the page. While the dataset is medium to large
(IMHO) it shouldn't peg two 500's. The problem happened the first time
reliably when I ran the query. I was using phpdb for the php interface (I
have stopped using that for the time being, because it maxes the system
every time.)
Jerry Lynde
IT - Invocation/Evocation, Banishing, et al.
"In E-commerce, the E stands for Evil."
Due Diligence Inc.
http://www.diligence.com
Phone: (406) 728-0001 x232
Fax: (406) 728-0006
Jerry Lynde wrote: > > As for the query I'm running, it was simply select * from bigtable (about > 2-300k lines) where > firstname=<obscure fname> and > lastname=<obscure lname> and > DOB=<the exact DOB for the above named individual> What indices do you have on those fields? Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes:
> Jerry Lynde wrote:
>>
>> As for the query I'm running, it was simply select * from bigtable (about
>> 2-300k lines) where
>> firstname=<obscure fname> and
>> lastname=<obscure lname> and
>> DOB=<the exact DOB for the above named individual>
> What indices do you have on those fields?
And is it using them --- ie, what does EXPLAIN say about the query?
regards, tom lane
At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: >Jerry Lynde wrote: > > > > As for the query I'm running, it was simply select * from > bigtable (about > > 2-300k lines) where > > firstname=<obscure > fname> and > > lastname=<obscure > lname> and > > DOB=<the exact DOB for > the above named individual> > >What indices do you have on those fields? > >Regards, >Ed Loehr They are all indexed, the DOB index is actually DOBYear DOBDay and DOBMonth and all 5 fields are indexed
At 01:21 PM 6/1/00 -0400, Tom Lane wrote: >Ed Loehr <eloehr@austin.rr.com> writes: > > Jerry Lynde wrote: > >> > >> As for the query I'm running, it was simply select * from bigtable (about > >> 2-300k lines) where > >> firstname=<obscure fname> and > >> lastname=<obscure lname> and > >> DOB=<the exact DOB for the above named individual> > > > What indices do you have on those fields? > >And is it using them --- ie, what does EXPLAIN say about the query? > > regards, tom lane It started indexing on DOBMonth until I did a VACUUM ANALYZE, after which it uses the lastname index
Jerry Lynde wrote: > > At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: > >Jerry Lynde wrote: > > > > > > As for the query I'm running, it was simply select * from > > bigtable (about > > > 2-300k lines) where > > > firstname=<obscure > > fname> and > > > lastname=<obscure > > lname> and > > > DOB=<the exact DOB for > > the above named individual> > > > >What indices do you have on those fields? > > > >Regards, > >Ed Loehr > > They are all indexed, the DOB index is actually DOBYear DOBDay and > DOBMonth and all 5 fields are indexed Do you have 5 indexes or do you have an index that spans more than one field?
At 05:19 PM 6/1/00 -0400, Joseph Shraibman wrote: >Jerry Lynde wrote: > > > > At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: > > >Jerry Lynde wrote: > > > > > > > > As for the query I'm running, it was simply select * from > > > bigtable (about > > > > 2-300k lines) where > > > > firstname=<obscure > > > fname> and > > > > lastname=<obscure > > > lname> and > > > > DOB=<the exact DOB for > > > the above named individual> > > > > > >What indices do you have on those fields? > > > > > >Regards, > > >Ed Loehr > > > > They are all indexed, the DOB index is actually DOBYear DOBDay and > > DOBMonth and all 5 fields are indexed > >Do you have 5 indexes or do you have an index that spans more than one >field? Sorry for being less than explicit. There are 5 separate indices, one per field. Jerry Lynde IT - Invocation/Evocation, Banishing, et al. "In E-commerce, the E stands for Evil." Due Diligence Inc. http://www.diligence.com Phone: (406) 728-0001 x232 Fax: (406) 728-0006
Jerry Lynde <jlynde@diligence.com> writes:
>>>> They are all indexed, the DOB index is actually DOBYear DOBDay and
>>>> DOBMonth and all 5 fields are indexed
>>
>> Do you have 5 indexes or do you have an index that spans more than one
>> field?
> Sorry for being less than explicit. There are 5 separate indices, one per
> field.
So your query is really something more like
... WHERE firstname = 'joe' AND lastname = 'blow' AND
DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1
?
The problem here is that only one index can be used in any individual
scan. If I were the optimizer I'd probably figure that lastname is
going to be the most selective of the five available choices, too.
I'd suggest storing the DOB as *one* field of type 'date'. You can
pull out the subparts for display with date_part() when you need to,
but for searches you'll be a lot better off with
WHERE DOB = '1999-01-01'
regards, tom lane
At 05:58 PM 6/1/00 -0400, you wrote:
Jerry Lynde <jlynde@diligence.com> writes:
>>>> They are all indexed, the DOB index is actually DOBYear DOBDay and
>>>> DOBMonth and all 5 fields are indexed
>>
>> Do you have 5 indexes or do you have an index that spans more than one
>> field?
> Sorry for being less than explicit. There are 5 separate indices, one per
> field.
So your query is really something more like
... WHERE firstname = 'joe' AND lastname = 'blow' AND
DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1
?
yes
The problem here is that only one index can be used in any individual
scan. If I were the optimizer I'd probably figure that lastname is
going to be the most selective of the five available choices, too.
and it did, and that's ok
I'd suggest storing the DOB as *one* field of type 'date'. You can
pull out the subparts for display with date_part() when you need to,
but for searches you'll be a lot better off with
WHERE DOB = '1999-01-01'
regards, tom lane
Thanks for the tip. I might indeed take that approach in the future,
however that's not really the problem I'm trying to tackle right now.
Indexing by Last Name is fine with me, currently. What's not working for me
is the part where the dual pentium 500 machine with 256MB RAM goes into
deep thought indefinitely for one simple hard-coded query.
I used to think that the problem was due to the phpdb module that I was
invoking, since the behavior exhibited itself consistently doing the
aforementioned query with the phpdb module. Using nothing but straight php
I have been able to make the query run smoothly.
The reason I no longer believe the problem was tied to phpdb is that the
behavior with the processors (all processor time devoted to user processes)
happened when I was not making use of phpdb anymore. In fact I wasn't even
making queries at the time, so it may not be tied to postgres at all, but I
suspect it might, since the problem happens at random currently, but was
consistent and predictable with the phpdb-driven postgres query.
Jerry Lynde <jlynde@diligence.com> writes:
> Thanks for the tip. I might indeed take that approach in the future,
> however that's not really the problem I'm trying to tackle right now.
> Indexing by Last Name is fine with me, currently. What's not working for me
> is the part where the dual pentium 500 machine with 256MB RAM goes into
> deep thought indefinitely for one simple hard-coded query.
Ah, sorry ... I've been seeing so many optimizer questions lately that
I tend to zero right in on anything that looks like a misoptimization
issue.
I'm not aware of any reason that a query such as you describe would
tend to hang up the machine. It would be useful to know what you see
in "top" or some other monitoring program when the problem happens.
Is there just one backend process sucking all the CPU time? More than
one? Is the process(es) memory usage stable, or climbing?
An even more useful bit of info is a stack trace from a backend that's
suffering the problem: if you do a "kill -ABORT" on it you should get
a coredump and be able to backtrace with gdb. (Note this will cause
a database system restart, ie all the other backends will commit
harakiri too, so I wouldn't advise doing it during normal usage of the
system.)
regards, tom lane