Обсуждение: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

Поиск
Список
Период
Сортировка

[Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
"Cyril VELTER"
Дата:
De : mailto:dev@archonet.com
> Cyril VELTER wrote:
> >
> >     maintenance_work_mem and work_mem are set to their default value (16M /
1M).
> > Does Copy use any of these values ?  The doc only state sort operations for

> > work_mem and vacuum / create index / alter table add foreing key for
> > maintenance_work_mem.
>
> You'll probably want to increase both of those (assuming you have a few
> gigs of RAM).

    The server have 4G of ram. But doing a search in the source code it does not
seem that these values are used during a copy operation. I will try to increase
these values.

>
> >     BTW, just dumping this table fail with the same error.
>
> Hmm - what version of 8.2 are you running on Windows?
>
> It's just that 8.2.4 has a fix that says:
>
> Allow pg_dump to do binary backups larger than two gigabytes on Windows
> (Magnus)
>
> Now, whether that will affect you I'm not sure, since you said you were
> dumping from Linux, with the server on Windows.

    I don't think so. The dump stop at 75GB (the 2GB limit is not a problem), I'm
running 8.2.5 on both the client and the server and the dump is done on a linux
machine.


    Cyril VELTER


Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
Richard Huxton
Дата:
Cyril VELTER wrote:
>> It's just that 8.2.4 has a fix that says:
>>
>> Allow pg_dump to do binary backups larger than two gigabytes on Windows
>> (Magnus)
>>
>> Now, whether that will affect you I'm not sure, since you said you were
>> dumping from Linux, with the server on Windows.
>
>     I don't think so. The dump stop at 75GB (the 2GB limit is not a problem), I'm
> running 8.2.5 on both the client and the server and the dump is done on a linux
> machine.

Can you select all the rows in that table, or does that cause an error too?

--
   Richard Huxton
   Archonet Ltd

Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
Tom Lane
Дата:
"Cyril VELTER" <cyril.velter@metadys.com> writes:
>     The server have 4G of ram. But doing a search in the source code it does not
> seem that these values are used during a copy operation.

They aren't.  The "out of memory" complaint sounds more like it might be
due to corrupt data, ie the olde scenario where a variable-width field's
length word contains a ridiculously large value.  I don't know how that
would lead to a winsock error, though.

            regards, tom lane

[Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
"Cyril VELTER"
Дата:
De : mailto:tgl@sss.pgh.pa.us
> "Cyril VELTER" <cyril.velter@metadys.com> writes:
> >     The server have 4G of ram. But doing a search in the source code it does
not
> > seem that these values are used during a copy operation.
>
> They aren't.  The "out of memory" complaint sounds more like it might be
> due to corrupt data, ie the olde scenario where a variable-width field's
> length word contains a ridiculously large value.  I don't know how that
> would lead to a winsock error, though.

    Yes this is very strange. The corruption scenario is plausible as the dump
stop at nearly the same place each time (it's a live database so the place is
not exactly the same).

    Also, the database have been upgraded recently (2007-11-09) from 8.0.13 to
8.2.5 so I suppose the corruption have occured between this date and now ?

    I have run the following query to get all record with one field over 10MB :

select
p2,length(p18155),length(p18154),length(p4065),length(p4083),length(p4020),lengt
h(p4074),length(p3857),length(p32402),length(p5512),length(p18175) from c2674
where length(p18155)>10000000 or length(p18154)>10000000 or
length(p4065)>10000000 or length(p4083)>10000000 or length(p4020)>10000000 or
length(p4074)>10000000 or length(p3857)>10000000 or length(p32402)>10000000 or
length(p5512)>10000000 or length(p18175)>10000000;

    The biggest value is 28034379.

    Is length() supposed to return the very high length in case of corruption ?

    Is there anythning else i can do ?

    Thanks,


    Cyril VELTER


Table definition :


 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 p2     | bigint                      |
 p9     | boolean                     |
 p3337  | integer                     |
 p18155 | text                        |
 p18154 | text                        |
 p17561 | bigint                      |
 p4065  | text                        |
 p689   | bigint                      |
 p43711 | integer                     |
 p4083  | text                        |
 p4020  | text                        |
 p4029  | text                        |
 p4218  | timestamp without time zone |
 p4074  | text                        |
 p45127 | bigint                      |
 p3857  | text                        |
 p7     | timestamp without time zone |
 p6     | bigint                      |
 p5     | timestamp without time zone |
 p32402 | text                        |
 p5512  | bytea                       |
 p18175 | bytea                       |
 p42542 | bigint                      |
 p17562 | integer                     |
 p4     | bigint                      |
Indexes:
    "idx_2674_p2" UNIQUE, btree (p2)
    "idx_2674_p17562" btree (p17562)
    "idx_2674_p4" btree (p4)
    "idx_2674_p4029" btree (p4029)
    "idx_2674_p4218" btree (p4218)
    "idx_2674_p42542" btree (p42542)
    "idx_2674_p45127" btree (p45127)
    "idx_2674_p5" btree (p5)
    "idx_2674_p6" btree (p6)
    "idx_2674_p689" btree (p689)
    "idx_2674_p7" btree (p7)

Row count=1320365


Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
Richard Huxton
Дата:
Cyril VELTER wrote:
>
>     I have run the following query to get all record with one field over 10MB :
>
> select
> p2,length(p18155),length(p18154),length(p4065),length(p4083),length(p4020),lengt
> h(p4074),length(p3857),length(p32402),length(p5512),length(p18175) from c2674
> where length(p18155)>10000000 or length(p18154)>10000000 or
> length(p4065)>10000000 or length(p4083)>10000000 or length(p4020)>10000000 or
> length(p4074)>10000000 or length(p3857)>10000000 or length(p32402)>10000000 or
> length(p5512)>10000000 or length(p18175)>10000000;
>
>     The biggest value is 28034379.
>
>     Is length() supposed to return the very high length in case of corruption ?

You'd have thought it would. The odd thing (if it is data corruption) is
that you would expect to see something in the server logs about a
failure to allocate 12345412234124 bytes of memory or some such. Whereas
all you get is this winsock error.

>     Is there anythning else i can do ?

Could you try CREATE some_table AS SELECT * FROM c2674 WHERE ... to copy
the large rows within the database. If that fails, the table is
corrupted but you can identify the problem rows and work around them
while you dump the data.

If it doesn't fail, that suggests (to me, anyway) you've found a bug
somewhere in the communication between server and client (which would
explain the winsock error).

--
   Richard Huxton
   Archonet Ltd

[Re] Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
"Cyril VELTER"
Дата:
De : mailto:dev@archonet.com
> Cyril VELTER wrote:
> >
> >     Is length() supposed to return the very high length in case of corruption
?
>
> You'd have thought it would. The odd thing (if it is data corruption) is
> that you would expect to see something in the server logs about a
> failure to allocate 12345412234124 bytes of memory or some such. Whereas
> all you get is this winsock error.

    I have another theory. The message printed by pg_dump :

    "pg_dump: Error message from server: out of memory"

    is printed in dumpTableData_copy (pg_dump.c) : write_msg(NULL, "Error message
from server: %s", PQerrorMessage(g_conn));

    There are serveral places in libpq where the conn error might be set to "out
of memory". I've also discovered that the machine running pg_dump is pretty
tight in ram (256M) and that no swap file is active (oversight after a disk
upgrade). May be this error is simply pg_dump running out of memory and not the
server. This would also explain that the server only report a socket error
(though I would have expected WSAECONNRESET (10054) instead of WSAECONNREFUSED
(10061)).


    The 8.0.13 pg_dump used to work just fine on the exact same machine. Maybe 8.2
pg_dump need more memory (or maybe a memory leak ?).

    I'm running the dump again after adding some swap space and will monitor
memory usage.

    What do you think ?

> >     Is there anythning else i can do ?
>
> Could you try CREATE some_table AS SELECT * FROM c2674 WHERE ... to copy
> the large rows within the database. If that fails, the table is
> corrupted but you can identify the problem rows and work around them
> while you dump the data.


    I will try that this week end if my theory prove wrong (I need to make disk
space available on the server for that, the table is 60GB).

    Thanks,

    Cyril


Re: [Re] Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database

От
Richard Huxton
Дата:
Cyril VELTER wrote:
> De : mailto:dev@archonet.com
>> Cyril VELTER wrote:
>>>     Is length() supposed to return the very high length in case of corruption
> ?
>> You'd have thought it would. The odd thing (if it is data corruption) is
>> that you would expect to see something in the server logs about a
>> failure to allocate 12345412234124 bytes of memory or some such. Whereas
>> all you get is this winsock error.
>
>     I have another theory. The message printed by pg_dump :
>
>     "pg_dump: Error message from server: out of memory"
>
>     is printed in dumpTableData_copy (pg_dump.c) : write_msg(NULL, "Error message
> from server: %s", PQerrorMessage(g_conn));
>
>     There are serveral places in libpq where the conn error might be set to "out
> of memory". I've also discovered that the machine running pg_dump is pretty
> tight in ram (256M) and that no swap file is active (oversight after a disk
> upgrade). May be this error is simply pg_dump running out of memory and not the
> server. This would also explain that the server only report a socket error
> (though I would have expected WSAECONNRESET (10054) instead of WSAECONNREFUSED
> (10061)).

Aha - that sounds likely. If you're dumping multi-megabyte rows I can
see how you'd get into problems on a client with a small amount of RAM.

>     The 8.0.13 pg_dump used to work just fine on the exact same machine. Maybe 8.2
> pg_dump need more memory (or maybe a memory leak ?).

Might just be slightly higher RAM usage in your particular case. It
could be there were some trade-offs between size and speed.

>     I'm running the dump again after adding some swap space and will monitor
> memory usage.
>
>     What do you think ?

I think you've found the problem. If you're short of RAM though you
might also have difficulty restoring the dump.

You could run pg_dump on the Windows server and copy its output to the
RAM-limited Linux box. You could even run pg_restore from the Windows
box - if you don't have a direct channel to the database you can use the
putty ssh-client to create a tunnelled connection to the Linux box.

--
   Richard Huxton
   Archonet Ltd