Обсуждение: Strange \copy failure

Поиск
Список
Период
Сортировка

Strange \copy failure

От
Steve Woodcock
Дата:
This is on postgresql 7.0.2, RH6.2.

I'm doing a \copy from psql of a 2035 line file, and the backend is
hanging (no CPU being used). Control-C doesn't cancel the command and
I have to Control-Z and 'kill %p' to get psql to terminate. The
interesting thing is that if I chop the input file in half, it
works. I can \copy the first half, and \copy the second half. Even
more interesting, if I take the first 2000 lines of the file it works,
and if I take the last 2000 lines of the file it works. This is a 2035
line file, so it makes me thing there isn't anything wrong with the
data in it.

For what it's worth, this file has been copied in daily for quite a
few months without problems.

Trying to \copy the whole file, the last few lines of postmaster
output are:

010614.14:00:38.425 [19753] query: COPY "allocation_header" FROM STDIN
010614.14:00:38.425 [19753] parser outputs:

{ QUERY :command 5  :utility ?  :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :unionall
false:distinctClause <> :sortClause <> :rtable <> :targetlist <> :qual <> :groupClause <> :havingQual <> :hasAggs false
:hasSubLinksfalse :unionClause <> :intersectClause <> :limitOffset <> :limitCount <> :rowMark <>} 

010614.14:00:38.425 [19753] after rewriting:
{ QUERY
   :command 5
   :utility ?
   :resultRelation 0
   :into <>
   :isPortal false
   :isBinary false
   :isTemp false
   :unionall false
   :distinctClause <>
   :sortClause <>
   :rtable <>
   :targetlist <>
   :qual <>
   :groupClause <>
   :havingQual <>
   :hasAggs false
   :hasSubLinks false
   :unionClause <>
   :intersectClause <>
   :limitOffset <>
   :limitCount <>
   :rowMark <>
   }

010614.14:00:38.426 [19753] ProcessUtility: COPY "allocation_header" FROM STDIN
010614.14:00:38.427 [19753] NOTICE:  copy: line 1, CopyReadNewline: extra fields ignored
010614.14:00:38.428 [19753] NOTICE:  copy: line 2, CopyReadNewline: extra fields ignored
010614.14:00:38.428 [19753] NOTICE:  copy: line 3, CopyReadNewline: extra fields ignored
...
010614.14:00:38.598 [19753] NOTICE:  copy: line 837, CopyReadNewline: extra fields ignored
010614.14:00:38.598 [19753] NOTICE:  copy: line 838, CopyReadNewline: extra fields ignored
010614.14:00:38.599 [19753] NOTICE:  copy: line 839, CopyReadNewline: extra fields ignored
010614.14:00:38.599 [19753] NOTICE:  copy: line 840, CopyReadNewline: extra fields ignored

...and then it just sits there.

(the file has more columns in it than the database table, but this
shouldn't be a problem should it?)

If I do an SQL COPY rather than a \copy, it works fine.

I tried stopping and starting the postmaster, dropping and recreating
the database (with just empty tables, no data), and also creating a
new database (ie. different name), but nothing works. I'm a bit stuck
for ideas now!

In case it's useful here's the output from ipcs:

------ Shared Memory Segments --------
key       shmid     owner     perms     bytes     nattch    status
0x10feed01 0         root      644       25224     5
0x0052e2ca 8271873   postgres  700       144       1
0x0052e2c1 8271874   postgres  600       17227776  1
0x0052e2c7 8271875   postgres  600       66060     1
0x00280267 4         root      644       1048576   2

------ Semaphore Arrays --------
key       semid     owner     perms     nsems     status
0x00feed00 0         root      644       2
0x0052e2ce 54273     postgres  600       16
0x0052e2cf 54274     postgres  600       16
0x00280269 3         root      666       14

------ Message Queues --------
key       msqid     owner     perms     used-bytes  messages

When the postmaster is stopped, all the postgres owned lines go away,
so I don't think I've got any unattached ones lying around.

Does anybody have any ideas for what I can do to fix this?

Regards, Steve Woodcock

Re: Strange \copy failure

От
Steve Woodcock
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> That is strange.  Try attaching to the hung backend with gdb and getting
> a backtrace, viz:
>
>     gdb /path/to/postgres-executable
>     gdb> attach PID-of-backend-process
>     gdb> bt
>     gdb> quit
>
> Might also be a good idea to do similarly for the psql process, in case
> that's where the problem is.
>
>             regards, tom lane

(You'll notice I've upgraded to 7.1.2 (PGDG RPMS))

This is the backend:

[root@soitsrv03 /root]# gdb /usr/bin/postgres
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)...
(gdb) attach 21499
Attaching to program: /usr/bin/postgres, Pid 21499
Reading symbols from /usr/lib/libssl.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libcrypto.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
Reading symbols from /lib/libnss_files.so.2...done.
0x4025fc62 in __libc_send () from /lib/libc.so.6
(gdb) bt
#0  0x4025fc62 in __libc_send () from /lib/libc.so.6
#1  0x80c7ea2 in pq_flush ()
#2  0x8139092 in elog ()
#3  0x80a849c in DoCopy ()
#4  0x80a7fba in DoCopy ()
#5  0x80a751f in DoCopy ()
#6  0x80fe500 in ProcessUtility ()
#7  0x80fca25 in pg_exec_query_string ()
#8  0x80fdab6 in PostgresMain ()
#9  0x80e8035 in PostmasterMain ()
#10 0x80e7c1c in PostmasterMain ()
#11 0x80e6e77 in PostmasterMain ()
#12 0x80e6826 in PostmasterMain ()
#13 0x80c89fe in main ()
#14 0x401c59cb in __libc_start_main (main=0x80c8890 <main>, argc=3, argv=0xbffffb44, init=0x8065a78 <_init>,
    fini=0x8145f2c <_fini>, rtld_fini=0x4000aea0 <_dl_fini>, stack_end=0xbffffb3c) at
../sysdeps/generic/libc-start.c:92
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/postgres, Pid 21499

This is psql:

[root@soitsrv03 /root]# gdb `which psql`
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)...
(gdb) attach 21498
Attaching to program: /usr/bin/psql, Pid 21498
Reading symbols from /usr/lib/libpq.so.2...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libcrypto.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
0x4026b15e in __select () from /lib/libc.so.6
(gdb) bt
#0  0x4026b15e in __select () from /lib/libc.so.6
#1  0x4002487c in ?? () from /usr/lib/libpq.so.2
#2  0x4001e0ff in pqFlush () from /usr/lib/libpq.so.2
#3  0x4001c8b9 in PQendcopy () from /usr/lib/libpq.so.2
#4  0x804fc91 in strcpy () at ../sysdeps/generic/strcpy.c:30
#5  0x804f9ae in strcpy () at ../sysdeps/generic/strcpy.c:30
#6  0x804a822 in strcpy () at ../sysdeps/generic/strcpy.c:30
#7  0x804a52e in strcpy () at ../sysdeps/generic/strcpy.c:30
#8  0x804f150 in strcpy () at ../sysdeps/generic/strcpy.c:30
#9  0x80502b6 in strcpy () at ../sysdeps/generic/strcpy.c:30
#10 0x401d69cb in __libc_start_main (main=0x804fcc0 <strcpy+22720>, argc=1, argv=0xbffffae4, init=0x8049be0 <_init>,
    fini=0x805797c <_fini>, rtld_fini=0x4000aea0 <_dl_fini>, stack_end=0xbffffadc) at
../sysdeps/generic/libc-start.c:92

This may be relevant: when I C-z the psql session and 'kill %p', psql
terminates and the backend continues copying records from line
841... and copies to the end. Subsequently (in a new psql session) I
can see all the records have been loaded properly.

Hope this helps.


Regards, Steve Woodcock

Re: Strange \copy failure

От
Tom Lane
Дата:
Steve Woodcock <swoodcock@scholastic.co.uk> writes:
> (gdb) bt
> #0  0x4025fc62 in __libc_send () from /lib/libc.so.6
> #1  0x80c7ea2 in pq_flush ()
> #2  0x8139092 in elog ()
> #3  0x80a849c in DoCopy ()
> #4  0x80a7fba in DoCopy ()
> #5  0x80a751f in DoCopy ()
> #6  0x80fe500 in ProcessUtility ()

Hmm, it would seem the backend is trying to send an elog message to
psql, and psql isn't taking it.  I bet all those "line too long"
messages caught up with you.  I'm not sure that libpq is designed
to absorb NOTICE messages that come in during a COPY operation;
probably, if you get so many as to fill the kernel-level buffers for
the socket, you have a problem.

Anyone want to try to fix that?

            regards, tom lane

Re: Strange \copy failure

От
Tom Lane
Дата:
Steve Woodcock <swoodcock@scholastic.co.uk> writes:
> This is on postgresql 7.0.2, RH6.2.
> I'm doing a \copy from psql of a 2035 line file, and the backend is
> hanging (no CPU being used). Control-C doesn't cancel the command and
> I have to Control-Z and 'kill %p' to get psql to terminate. The
> interesting thing is that if I chop the input file in half, it
> works. I can \copy the first half, and \copy the second half. Even
> more interesting, if I take the first 2000 lines of the file it works,
> and if I take the last 2000 lines of the file it works. This is a 2035
> line file, so it makes me thing there isn't anything wrong with the
> data in it.

That is strange.  Try attaching to the hung backend with gdb and getting
a backtrace, viz:

    gdb /path/to/postgres-executable
    gdb> attach PID-of-backend-process
    gdb> bt
    gdb> quit

Might also be a good idea to do similarly for the psql process, in case
that's where the problem is.

            regards, tom lane