Обсуждение: How to repair Pg 9.1?
Recently I started a hard query (ALTER TABLE) and then Postgresql was killed by OOM killer. Since that clients lose their connection and Postgresql says in log: 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault 2012-04-15 16:46:02 MSK LOG: terminating any other active server processes 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. 2012-04-15 16:46:02 MSK LOG: all server processes terminated; reinitializing 2012-04-15 16:46:02 MSK LOG: database system was interrupted; last known up at 2012-04-15 16:45:17 MSK 2012-04-15 16:46:02 MSK LOG: database system was not properly shut down; automatic recovery in progress 2012-04-15 16:46:02 MSK LOG: consistent recovery state reached at 58/21C44D18 2012-04-15 16:46:02 MSK LOG: redo starts at 58/21C44D18 2012-04-15 16:46:02 MSK LOG: record with zero length at 58/21CA7E20 2012-04-15 16:46:02 MSK LOG: redo done at 58/21CA7DE0 2012-04-15 16:46:02 MSK LOG: last completed transaction was at log time 2012-04-15 16:46:02.620188+04 2012-04-15 16:46:02 MSK LOG: autovacuum launcher started 2012-04-15 16:46:02 MSK LOG: database system is ready to accept connections What can I do in the situation? Postgresql 9.1/Debian wheezy -- . ''`. Dmitry E. Oboukhov : :’ : email: unera@debian.org jabber://UNera@uvw.ru `. `~’ GPGKey: 1024D / F8E26537 2006-11-21 `- 1B23 D4F8 8EC0 D902 0555 E438 AB8C 00CF F8E2 6537
Вложения
On 15.4.2012 14:52, Dmitry E. Oboukhov wrote: > Recently I started a hard query (ALTER TABLE) and then Postgresql was > killed by OOM killer. > > Since that clients lose their connection and Postgresql says in log: > > 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault > 2012-04-15 16:46:02 MSK LOG: terminating any other active server processes > 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process > 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. > 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. > 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process > 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. > 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. > 2012-04-15 16:46:02 MSK WARNING: terminating connection because of crash of another server process > 2012-04-15 16:46:02 MSK DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. > 2012-04-15 16:46:02 MSK HINT: In a moment you should be able to reconnect to the database and repeat your command. > 2012-04-15 16:46:02 MSK LOG: all server processes terminated; reinitializing > 2012-04-15 16:46:02 MSK LOG: database system was interrupted; last known up at 2012-04-15 16:45:17 MSK > 2012-04-15 16:46:02 MSK LOG: database system was not properly shut down; automatic recovery in progress > 2012-04-15 16:46:02 MSK LOG: consistent recovery state reached at 58/21C44D18 > 2012-04-15 16:46:02 MSK LOG: redo starts at 58/21C44D18 > 2012-04-15 16:46:02 MSK LOG: record with zero length at 58/21CA7E20 > 2012-04-15 16:46:02 MSK LOG: redo done at 58/21CA7DE0 > 2012-04-15 16:46:02 MSK LOG: last completed transaction was at log time 2012-04-15 16:46:02.620188+04 > 2012-04-15 16:46:02 MSK LOG: autovacuum launcher started > 2012-04-15 16:46:02 MSK LOG: database system is ready to accept connections > > > What can I do in the situation? Well, what happened is quite clear. One of the processes was terminated by the OOM killer. The database noticed this and responded properly by restarting the database. That caused a proper recovery - you don't need to "repair" the database. What you need to do is change the machine configuration so that the OOM errors do not happen. I don't know what's running on the machine, but this usually means tweaking the vm.* parameters at the kernel level and shared_buffers/work_mem/maintenance_work_mem at the PostgreSQL level. In this particular situation (ALTER TABLE) I'd use a low value for the maintenance_work_mem parameter - it'll run longer but won't need that much memory. Tomas
>> 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> What can I do in the situation? > Well, what happened is quite clear. One of the processes was terminated > by the OOM killer. The database noticed this and responded properly by > restarting the database. That caused a proper recovery - you don't need > to "repair" the database. > What you need to do is change the machine configuration so that the OOM > errors do not happen. I don't know what's running on the machine, but > this usually means tweaking the vm.* parameters at the kernel level and > shared_buffers/work_mem/maintenance_work_mem at the PostgreSQL level. > In this particular situation (ALTER TABLE) I'd use a low value for the > maintenance_work_mem parameter - it'll run longer but won't need that > much memory. > Tomas I've just upgraded my postgresql upto current wheezy version (9.1.3-2/Debian): I wanted to get backtrace so it was upgraded when I installed postgresql-dbg package. After upgrading it discontinued segfault and began work fine. :) -- . ''`. Dmitry E. Oboukhov : :’ : email: unera@debian.org jabber://UNera@uvw.ru `. `~’ GPGKey: 1024D / F8E26537 2006-11-21 `- 1B23 D4F8 8EC0 D902 0555 E438 AB8C 00CF F8E2 6537
Вложения
"Dmitry E. Oboukhov" <unera@debian.org> writes: > Recently I started a hard query (ALTER TABLE) and then Postgresql was > killed by OOM killer. > ... > 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault You sure it was an OOM kill? The OOM killer does its dirty work with signal 9 (SIGKILL), not signal 11 (SIGSEGV). regards, tom lane
>> Recently I started a hard query (ALTER TABLE) and then Postgresql was >> killed by OOM killer. >> ... >> 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault > You sure it was an OOM kill? The OOM killer does its dirty work with > signal 9 (SIGKILL), not signal 11 (SIGSEGV). OOM killer was when I did 'ALTER TABLE' (in the first time), the log was since the time for each client's connecting. Sorry, my bad English, I could not explain cleanly :) -- . ''`. Dmitry E. Oboukhov : :’ : email: unera@debian.org jabber://UNera@uvw.ru `. `~’ GPGKey: 1024D / F8E26537 2006-11-21 `- 1B23 D4F8 8EC0 D902 0555 E438 AB8C 00CF F8E2 6537
Вложения
On 15.4.2012 16:16, Dmitry E. Oboukhov wrote: >>> 2012-04-15 16:46:02 MSK LOG: server process (PID 825) was terminated by signal 11: Segmentation fault > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> What can I do in the situation? > >> Well, what happened is quite clear. One of the processes was terminated >> by the OOM killer. The database noticed this and responded properly by >> restarting the database. That caused a proper recovery - you don't need >> to "repair" the database. > >> What you need to do is change the machine configuration so that the OOM >> errors do not happen. I don't know what's running on the machine, but >> this usually means tweaking the vm.* parameters at the kernel level and >> shared_buffers/work_mem/maintenance_work_mem at the PostgreSQL level. > >> In this particular situation (ALTER TABLE) I'd use a low value for the >> maintenance_work_mem parameter - it'll run longer but won't need that >> much memory. > >> Tomas > > I've just upgraded my postgresql upto current wheezy version > (9.1.3-2/Debian): I wanted to get backtrace so it was upgraded when I > installed postgresql-dbg package. After upgrading it discontinued > segfault and began work fine. > :) I haven't noticed it was a segfault - I kinda believed you it actually was an OOM kill, but that does not cause segfaults. That's a whole different story ... It you were getting that whenever you run the ALTER TABLE and it stopped after upgrading to 9.1.3, then there was something wrong in the version you've been using before and it got fixed in 9.1.3. Or maybe it still exists but it does not happen so often. It'd be nice to know what version were you using and a detailed description of what you've done - table structure, size, exact ALTER TABLE command etc. Tomas