Обсуждение: Restore killing the backend
Hello, I'm trying to migrate from 7.1.3 to 7.2.1 I pg_dump'ed all the data from 7.1.3, yet when I try psql dbname < dumpfile it crashes, taking out postgres with it. I am running under RedHat Linux 7.3 (kernel-smp-2.4.18-5) This has happened both with the RPM from RedHat (on the CD) and the latest one off the website (postgresql-7.2.1-2PGDG) Attached are the output from two different runs. The error occurs when performing the COPY command to repopulate the data. Also, this dump will successfully import into a 7.1.3 db. I've looked in the dump file for "3835" (which it says it can't process) and the only occurannce for that string is within a phone number (text field). Finally, after the error I usually have to restart the machine, as I get segmentation faults from other apps - which hints to me that there's a problem somewhere in an OS level library. Any ideas? Thanks, -Steve <errors>------------------------
On Mon, Jul 29, 2002 at 04:38:36PM -0400, Stephen Bacon wrote: > I pg_dump'ed all the data from 7.1.3, yet when I try psql dbname < dumpfile > it crashes, taking out postgres with it. > Finally, after the error I usually have to restart the machine, as I get > segmentation faults from other apps - which hints to me that there's a > problem somewhere in an OS level library. > > Any ideas? Test your memory. It's possible you have some bad memory, and you happen to exercise it by reading the big dumpfile and writing into the database. Once the bad bit is getting used, you keep running into it; hence the subsequent segfaults. There have been _a lot_ of hadrware-related problems reported lately. For production use, if your data is worth anything at all, buy ECC memory. It's worth it, even though it's expensive. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
> Test your memory. It's possible you have some bad memory, and you <snip> > For production use, if your data is worth anything at all, buy ECC > memory. It's worth it, even though it's expensive. Hello again. Indeed we are using ECC memory - although if I knew how to test it, I would - is there a utility for doing this? For some reasons the errors did not appear in my original post, so here there again (hopefully). I've also included the tail of the postgres log file (I had debug_level set to 2) which shows (what I think is) a lot of WAL activity and recommendations to increase WAL_FILES. I increased wal_files to 8, wal_buffers to 15 and checkpoint_segments to 3 and yet the problem still occurs. Next I'm going to break my import up into separate files and do it step by step, but in the meantime does anyone have any ideas? Thanks, -Steve *** Run X ***
Hello *again* strangeness...something seems to be stripping the logs off the bottom of my last two posts (so I'm trying a different email client). Here's the third try - appologies for the wasted bandwidth. -Steve <crash with defaul WAL settings> [[[output from psql attempting import]]] <snip> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'tbloutdischassess_pkey' for table 'tbloutdischassess' CREATE NOTICE: copy: line 4232076, Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. lost synchronization with server, resetting connection connection to server was lost [[[and a second attempt:]]] <snip> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'tblnatldischtier_pkey' for table 'tblnatldischtier' CREATE NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'tbloutdischassess_pkey' for table 'tbloutdischassess' CREATE ERROR: copy: line 5706, pg_atoi: error in ""3835": can't parse ""3835" lost synchronization with server, resetting connection ERROR: copy: line 1008, Bad float4 input format 'LN' lost synchronization with server, resetting connection server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. lost synchronization with server, resetting connection connection to server was lost [[[ level 2 debug messages ]]] DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 11024) exited with exit code 0 DEBUG: recycled transaction log file 0000000000000025 DEBUG: recycled transaction log file 0000000000000026 DEBUG: recycled transaction log file 0000000000000024 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 11025) exited with exit code 0 FATAL 2: XLogWrite: write request 0/2D10C000 is past end of log 0/2D0FE000 DEBUG: proc_exit(2) DEBUG: shmem_exit(2) DEBUG: exit(2) DEBUG: reaping dead processes DEBUG: child process (pid 11026) exited with exit code 2 DEBUG: server process (pid 11026) exited with exit code 2 DEBUG: terminating any other active server processes DEBUG: CleanupProc: sending SIGQUIT to process 11009 NOTICE: copy: line 4232076, Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: reaping dead processes DEBUG: child process (pid 11009) exited with exit code 1 DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: shmem_exit(0) invoking IpcMemoryCreate(size=728809472) [[[ level 2 debug messages w/ increased WAL buffers ]]] <snip> DEBUG: StartTransactionCommand DEBUG: query: COPY "tblirfpai_quality" FROM stdin; DEBUG: ProcessUtility: COPY "tblirfpai_quality" FROM stdin; DEBUG: reaping dead processes DEBUG: child process (pid 4433) was terminated by signal 11 DEBUG: server process (pid 4433) was terminated by signal 11 DEBUG: terminating any other active server processes DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: shmem_exit(0) invoking IpcMemoryCreate(size=417505280) DEBUG: database system was interrupted at 2002-07-31 14:21:39 EDT DEBUG: checkpoint record is at 1/9400808C DEBUG: redo record is at 1/9300702C; undo record is at 0/0; shutdown FALSE DEBUG: next transaction id: 1778; next oid: 39371344 DEBUG: database system was not properly shut down; automatic recovery in progress DEBUG: redo starts at 1/9300702C DEBUG: reaping dead processes DEBUG: startup process (pid 4434) was terminated by signal 11 DEBUG: aborting startup due to startup process failure DEBUG: proc_exit(1) DEBUG: shmem_exit(1)DEBUG: exit(1)
On Wed, Jul 31, 2002 at 03:15:37PM -0400, Stephen Bacon wrote: > > Indeed we are using ECC memory - although if I knew how to test it, I > would - is there a utility for doing this? If you are using ECC, and your hardware and software support it, you should see errors in your syslogs if ECC is having trouble. Anyway, the whole point of ECC is to prevent bad data from getting through, so I doubt very much that's the problem. If you're using an x86 architecture, you can try memtest86: http://www.memtest86.com/ A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Wed, Jul 31, 2002 at 03:45:19PM -0400, Stephen Bacon wrote: > <crash with defaul WAL settings> > > [[[output from psql attempting import]]] > > <snip> > > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > 'tbloutdischassess_pkey' for table 'tbloutdischassess' > CREATE > NOTICE: copy: line 4232076, Message from PostgreSQL backend: > The Postmaster has informed me that some other backend ^^^^^^^^^^ Who else is connected? > ERROR: copy: line 5706, pg_atoi: error in ""3835": can't parse ""3835" ^ Looks like a delimiting problem. > ERROR: copy: line 1008, Bad float4 input format 'LN' ^^^^^ ^^ It appears your data source is bad. Maybe you need to have a look at it with an editor? A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Well! Indeed the problem *was* with the data (I know; assuming makes and ass...) I had assumed the data I was importing was that which had been pg_dumped from the 7.1.3 db and so was "correct". It turns out that after the pg_dump it had been processed though a perl script to: 1) convert the string ":60.00" to ":59.99" - pg_dump had a bug with roundoff of seconds and psql would choke trying to set a time with a seconds value of 60 2) convert the old style col. default of 'timestamp(now())' to CURRENT_TIMESTAMP However - some of our tables contain data with embedded ^M's (ASCII 13) (because they're being populated from a web page.) Well, the perl script would interpret the ^M as a newline and end up converting the two lines: 114<TAB>3<TAB>re: Updates<TAB>Can you read over^M\ the help files?<TAB>6523 to three lines: 114<TAB>3<TAB>re: Updates<TAB>Can you read over \ the help files?<TAB>6523 so the first line would have truncated data, and the second line would continue the mess. This would obviously cause problems during import. What I can't figure out is why it would kill the back-end (and usually end up making the OS (Linux 7.3) unstable). I've been trying to make a small example that repeatably shows this but of course it won't crash! It just (properly) give's a "Fail to add null" error. I'm going to try and make a more complex setup to get this thing to "reliably" cause the crash because it seems there's a bug in here somewhere. I'll post that when I get it. So anyways... for those of you out there migrating up where you process the dump before psql db < dumpfile - watch out for embedded ^Ms! thanks, -Steve