Обсуждение: BUG #2225: Backend crash -- BIG table
The following bug has been logged online:
Bug reference: 2225
Logged by: Patrick Rotsaert
Email address: patrick.rotsaert@arrowup.be
PostgreSQL version: 8.1.2
Operating system: Linux
Description: Backend crash -- BIG table
Details:
Situation:
---------
Database with 1 table:
CREATE TABLE pptran
(
cchk int4,
trid char(14),
csnr char(13),
amount int4,
date date,
"time" time,
lane int4,
"type" int4,
data char(24),
stat int4,
skip int4,
retry int4,
points_bc int4,
points_chip int4,
trid_tid int2,
trid_seq int2
);
This table contains approx. 36 million records!
CREATE INDEX pptran_trid
ON pptran
USING btree (trid);
Problem:
-------
Executing a select query causes the backend to crash. This is the output
from the psql frontend:
pointspp=# select trid, count(*) from pptran group by trid having count(*)
> 1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
This error appears after several minutes. During execution, 'top' shows
about 4% CPU usage and 98% memory usage of the postmaster process.
At the time of the crash, the server logs:
LOG: server process (PID 21815) was terminated by signal 9
LOG: terminating any other active server processes
FATAL: the database system is in recovery mode
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2006-01-30 15:04:31 CET
LOG: checkpoint record is at 3/275944AC
LOG: redo record is at 3/275944AC; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 5415; next OID: 16444
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 3/275944F0
LOG: redo is not required
LOG: database system is ready
LOG: transaction ID wrap limit is 2147484146, limited by database
"postgres"
Platform info:
-------------
- PostgreSQL version 8.1.2
- Linux Slackware 9.1.0, Kernel 2.4.22
- /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.20GHz
stepping : 4
cpu MHz : 2192.973
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4377.80
- /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 926220288 33148928 893071360 0 1163264 17268736
Swap: 2048376832 18296832 2030080000
MemTotal: 904512 kB
MemFree: 872140 kB
MemShared: 0 kB
Buffers: 1136 kB
Cached: 15288 kB
SwapCached: 1576 kB
Active: 5824 kB
Inactive: 15820 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 904512 kB
LowFree: 872140 kB
SwapTotal: 2000368 kB
SwapFree: 1982500 kB
"Patrick Rotsaert" <patrick.rotsaert@arrowup.be> writes: > At the time of the crash, the server logs: > LOG: server process (PID 21815) was terminated by signal 9 You're running on a Linux machine with memory overcommit enabled. Turn that off, or nothing will ever work very reliably --- the OOM killer is entirely capable of zapping innocent processes that have nothing to do with the one eating too much memory; and even when it kills the right process, "kill -9" is not IMHO an acceptable way for the system to tell a process it can't have any more memory. See http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 regards, tom lane
on 31/01/2006 16:18 Tom Lane wrote : >"Patrick Rotsaert" <patrick.rotsaert@arrowup.be> writes: > > >>At the time of the crash, the server logs: >> LOG: server process (PID 21815) was terminated by signal 9 >> >> > >You're running on a Linux machine with memory overcommit enabled. >Turn that off, or nothing will ever work very reliably --- the OOM >killer is entirely capable of zapping innocent processes that have >nothing to do with the one eating too much memory; and even when it >kills the right process, "kill -9" is not IMHO an acceptable way for >the system to tell a process it can't have any more memory. See >http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 > > regards, tom lane > > Syslog indeed shows: kernel: Out of Memory: Killed process 21815 (postmaster). Looking at the kernel source mm/mmap.c, the function `int vm_enough_memory(long pages)' does 1 simple test: /* Sometimes we want to use more memory than we have. */ if (sysctl_overcommit_memory) return 1; But /proc/sys/vm/overcommit_memory reads `0', so my guess is that overcommit is not enabled... right? Any hints? Thanks, Patrick Rotsaert
Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes: > on 31/01/2006 16:18 Tom Lane wrote : >> http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 > But /proc/sys/vm/overcommit_memory reads `0', so my guess is that > overcommit is not enabled... right? Please read the reference I pointed you to. regards, tom lane
Tom Lane wrote: >Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes: > > >>on 31/01/2006 16:18 Tom Lane wrote : >> >> >>>http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 >>> >>> > > > >>But /proc/sys/vm/overcommit_memory reads `0', so my guess is that >>overcommit is not enabled... right? >> >> > >Please read the reference I pointed you to. > > regards, tom lane > > I did read it, very carefully. The proposed fix will only work in 2.6 kernels. Mine is a 2.4 and upgrading it is not an option. The document suggests to look at the kernel source for 2.4 kernels. I did that, as I wrote in the previous mail. Setting the overcommit parameter to '2', or any value for that matter, won't do any good because in this kernel, it is only tested if it is non-zero. On my system, the parameter is 0, so overcommit is *not* enabled. I don't know what else I can do. The other proposed option is to install more memory. Sorry, not an option, 1GB has to be sufficient. Apart from the overcommit subject, why is postgres consuming so much memory? Should the solution of this problem not be searched for here? Thanks, Patrick Rotsaert
Patrick Rotsaert wrote: > On my > system, the parameter is 0, so overcommit is *not* enabled. 0 means overcommit is enabled. You want to set it to something other than 0 to prevent overcommitting and the consequent suprising process deaths. Exactly what other values are accepted varies, but 0 isn't the one for you. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Wed, Feb 01, 2006 at 17:57:50 +0100, Patrick Rotsaert <patrick.rotsaert@arrowup.be> wrote: > I did read it, very carefully. The proposed fix will only work in 2.6 > kernels. Mine is a 2.4 and upgrading it is not an option. The document > suggests to look at the kernel source for 2.4 kernels. I did that, as I > wrote in the previous mail. Setting the overcommit parameter to '2', or > any value for that matter, won't do any good because in this kernel, it > is only tested if it is non-zero. On my system, the parameter is 0, so > overcommit is *not* enabled. I don't know what else I can do. > The other proposed option is to install more memory. Sorry, not an > option, 1GB has to be sufficient. Is there some reason you can't add more swap space? > Apart from the overcommit subject, why is postgres consuming so much > memory? Should the solution of this problem not be searched for here? How do you know it is Postgres that is using lots of memory? The OOM killer doesn't just kill of memory hogs, so you can't just assume the processes being killed tells you which processes were using lots of memory. The memory that Postgres uses is controlled in postgresql.conf. One particular gotcha is that sortmem is per sort, so if you have a number of concurrent sorts you might be using more memory than you expected.
On Mon, 30 Jan 2006, Patrick Rotsaert wrote: > Problem: > ------- > Executing a select query causes the backend to crash. This is the output > from the psql frontend: > > pointspp=# select trid, count(*) from pptran group by trid having count(*) > > 1; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> One question is what does the explain (without analyze) plan look like for the above and are the row estimates valid in the case of one of the hash plans. Failing that, how many rows should the above return?
On Fri, 3 Feb 2006, Patrick Rotsaert wrote: > > >One question is what does the explain (without analyze) plan look like for > >the above and are the row estimates valid in the case of one of the hash > >plans. > > > > > pointspp=# explain select trid, count(*) from pptran group by trid > having count(*) > 1; > QUERY PLAN > -------------------------------------------------------------------------- > HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18) > Filter: (count(*) > 1) > -> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18) > (3 rows) Hmm, if you do an enable_hashagg=false and then run the query (without explain) does it work then? > >Failing that, how many rows should the above return? > > > > > That is exactly what I am trying to find out. I can only guess that, but > it should not be more than a couple of 10k rows. Okay, so it's unlikely to be a client side memory issue (if it were returning alot of rows, that could also be an issue).
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>> pointspp=# explain select trid, count(*) from pptran group by trid
>> having count(*) > 1;
>> QUERY PLAN
>> --------------------------------------------------------------------------
>> HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18)
>> Filter: (count(*) > 1)
>> -> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18)
>> (3 rows)
>>> Failing that, how many rows should the above return?
>> That is exactly what I am trying to find out. I can only guess that, but
>> it should not be more than a couple of 10k rows.
The problem is that the HashAgg will have to maintain a counter for
every distinct value of trid, not just those that occur more than
once. So if there are a huge number of one-time-only values you could
still blow out memory (and HashAgg doesn't currently know how to spill
to disk).
That "rows=200" estimate looks suspiciously like a default. Has this
table been ANALYZEd recently? I'd expect the planner not to choose
HashAgg if it has a more realistic estimate of the number of groups.
regards, tom lane
Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes:
> I did a vacuum analyze, now the explain gives different results.
> pointspp=# explain select trid, count(*) from pptran group by trid
> having count(*) > 1;
> QUERY PLAN
> --------------------------------------------------------------------------------
> GroupAggregate (cost=9842885.29..10840821.57 rows=36288592 width=18)
> Filter: (count(*) > 1)
> -> Sort (cost=9842885.29..9933606.77 rows=36288592 width=18)
> Sort Key: trid
> -> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592
> width=18)
> (5 rows)
OK that looks more reasonable.
> pointspp=# select trid, count(*) from pptran group by trid having
> count(*) > 1;
> ERROR: could not write block 661572 of temporary file: No space left on
> device
> HINT: Perhaps out of disk space?
> I have 5.1GB of free disk space. If this is the cause, I have a
> problem... or is there another way to extract (and remove) duplicate rows?
Time to buy more disk :-(
regards, tom lane
>Is there some reason you can't add more swap space? > > Yes, disk space. I have about 2 GB of swap space enabled. >How do you know it is Postgres that is using lots of memory? The OOM killer >doesn't just kill of memory hogs, so you can't just assume the processes >being killed tells you which processes were using lots of memory. > > > I keep 'top' running while I launch the query. One single postmaster climbs to the top, claiming 98% MEM and about 8% CPU. >The memory that Postgres uses is controlled in postgresql.conf. One particular >gotcha is that sortmem is per sort, so if you have a number of concurrent >sorts you might be using more memory than you expected. > > I am very sure there are no other queries running at the same time. This is a development machine of which I have full control.
>One question is what does the explain (without analyze) plan look like for
>the above and are the row estimates valid in the case of one of the hash
>plans.
>
>
pointspp=# explain select trid, count(*) from pptran group by trid
having count(*) > 1;
QUERY PLAN
--------------------------------------------------------------------------
HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18)
Filter: (count(*) > 1)
-> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18)
(3 rows)
>Failing that, how many rows should the above return?
>
>
That is exactly what I am trying to find out. I can only guess that, but
it should not be more than a couple of 10k rows.
>0 means overcommit is enabled. You want to set it to something other
>than 0 to prevent overcommitting and the consequent suprising process
>deaths. Exactly what other values are accepted varies, but 0 isn't the
>one for you.
>
>
>
I do not understand how 0 could mean overcommit is enabled. I do not
know how it is in recent kernels, but the source code of the 2.4 kernel
I use is this:
int vm_enough_memory(long pages)
{
unsigned long free;
/* Sometimes we want to use more memory than we have. */
if (sysctl_overcommit_memory)
return 1;
// ...
}
seems pretty straightforward to me.
I also did a recursive grep through all of the kernel source and this is
the only place where this parameter is used.
I tried setting the parameter to 1, but it did not make any difference.
>Hmm, if you do an enable_hashagg=false and then run the query (without >explain) does it work then? > > pointspp=# set enable_hashagg = false; SET pointspp=# select trid, count(*) from pptran group by trid having count(*) > 1; ERROR: could not write block 661582 of temporary file: No space left on device HINT: Perhaps out of disk space? Still does not work, but it no longer consumes that same amount of memory
>The problem is that the HashAgg will have to maintain a counter for
>every distinct value of trid, not just those that occur more than
>once. So if there are a huge number of one-time-only values you could
>still blow out memory (and HashAgg doesn't currently know how to spill
>to disk).
>
>
One-time-only values are in my case more probable, so it will use a lot
of counters.
>That "rows=200" estimate looks suspiciously like a default. Has this
>table been ANALYZEd recently? I'd expect the planner not to choose
>HashAgg if it has a more realistic estimate of the number of groups.
>
> regards, tom lane
>
>
I did a vacuum analyze, now the explain gives different results.
pointspp=# vacuum analyze;
VACUUM
pointspp=# explain select trid, count(*) from pptran group by trid
having count(*) > 1;
QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate (cost=9842885.29..10840821.57 rows=36288592 width=18)
Filter: (count(*) > 1)
-> Sort (cost=9842885.29..9933606.77 rows=36288592 width=18)
Sort Key: trid
-> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592
width=18)
(5 rows)
pointspp=# select trid, count(*) from pptran group by trid having
count(*) > 1;
ERROR: could not write block 661572 of temporary file: No space left on
device
HINT: Perhaps out of disk space?
I have 5.1GB of free disk space. If this is the cause, I have a
problem... or is there another way to extract (and remove) duplicate rows?
On Fri, Feb 03, 2006 at 19:38:04 +0100, Patrick Rotsaert <patrick.rotsaert@arrowup.be> wrote: > > I have 5.1GB of free disk space. If this is the cause, I have a > problem... or is there another way to extract (and remove) duplicate rows? How about processing a subset of the ids in one pass and then may make multiple passes to check all of the ids. As long as you don't have to use too small of chunks, this might work for you.