Обсуждение: Troubles dumping a very large table.

Поиск
Список
Период
Сортировка

Troubles dumping a very large table.

От
Ted Allen
Дата:
(NOTE: I tried sending this email from my excite account and it appears
to have been blocked for whatever reason.  But if the message does get
double posted, sorry for the inconvenience.)

Hey all,

Merry Christmas Eve, Happy Holidays, and all that good stuff.  At my
work, I'm trying to upgrade my system from a 8.1 to 8.3 and I'm dumping
a few large static tables ahead of time to limit the amount of downtime
during the upgrade.  The trouble is, when I dump the largest table,
which is 1.1 Tb with indexes, I keep getting the following error at the
same point in the dump.

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid string enlargement
request size 1
pg_dump: The command was: COPY public.large_table (id, data) TO stdout;

As you can see, the table is two columns, one column is an integer, and
the other is bytea.   Each cell in the data column can be as large as
600mb (we had bigger rows as well but we thought they were the source of
the trouble and moved them elsewhere to be dealt with separately.)

We are dumping the table using this command.

/var/lib/pgsql-8.3.5/bin/pg_dump  -O -x -t large_table mydb | gzip -c
 >  large_table.pgsql.gz

Originally we tried dumping the table with
'/var/lib/pgsql-8.3.5/bin/pg_dump  -O -x -t -F c > large_table.dump' but
that was to cpu intensive and slowed down other db processes too much.
It failed using that command as well, but I believe it is because we did
not have enough postgres temp hard drive space.  We have since symlinked
the postgres temp space to a much bigger file system.

The stats of the db server is as follows,

Processors:  4x Opteron 2.4 Ghz cores
Memory: 16 GB
Disks: 42x 15K SCSI 146 GB disks.

Also, the large table has been vacuumed recently.  Lastly, we are dumping the table over nfs to very large sata array.


Thanks again and Happy Holidays,
Ted


Re: Troubles dumping a very large table.

От
Tom Lane
Дата:
Ted Allen <tallen@blackducksoftware.com> writes:
> during the upgrade.  The trouble is, when I dump the largest table,
> which is 1.1 Tb with indexes, I keep getting the following error at the
> same point in the dump.

> pg_dump: SQL command failed
> pg_dump: Error message from server: ERROR:  invalid string enlargement
> request size 1
> pg_dump: The command was: COPY public.large_table (id, data) TO stdout;

> As you can see, the table is two columns, one column is an integer, and
> the other is bytea.   Each cell in the data column can be as large as
> 600mb (we had bigger rows as well but we thought they were the source of
> the trouble and moved them elsewhere to be dealt with separately.)

600mb measured how?  I have a feeling the problem is that the value
exceeds 1Gb when converted to text form...

            regards, tom lane

Re: Troubles dumping a very large table.

От
Ted Allen
Дата:
600mb measured by get_octet_length on data.  If there is a better way to measure the row/cell size, please let me know
becausewe thought it was the >1Gb problem too.  We thought we were being conservative by getting rid of the larger rows
butI guess we need to get rid of even more. 

Thanks,
Ted
________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Wednesday, December 24, 2008 12:49 PM
To: Ted Allen
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Troubles dumping a very large table.

Ted Allen <tallen@blackducksoftware.com> writes:
> during the upgrade.  The trouble is, when I dump the largest table,
> which is 1.1 Tb with indexes, I keep getting the following error at the
> same point in the dump.

> pg_dump: SQL command failed
> pg_dump: Error message from server: ERROR:  invalid string enlargement
> request size 1
> pg_dump: The command was: COPY public.large_table (id, data) TO stdout;

> As you can see, the table is two columns, one column is an integer, and
> the other is bytea.   Each cell in the data column can be as large as
> 600mb (we had bigger rows as well but we thought they were the source of
> the trouble and moved them elsewhere to be dealt with separately.)

600mb measured how?  I have a feeling the problem is that the value
exceeds 1Gb when converted to text form...

                        regards, tom lane

Re: Troubles dumping a very large table.

От
Tom Lane
Дата:
Ted Allen <tallen@blackducksoftware.com> writes:
> 600mb measured by get_octet_length on data.  If there is a better way to measure the row/cell size, please let me
knowbecause we thought it was the >1Gb problem too.  We thought we were being conservative by getting rid of the larger
rowsbut I guess we need to get rid of even more. 

Yeah, the average expansion of bytea data in COPY format is about 3X :-(
So you need to get the max row length down to around 300mb.  I'm curious
how you got the data in to start with --- were the values assembled on
the server side?

            regards, tom lane

Re: Troubles dumping a very large table.

От
"Merlin Moncure"
Дата:
On Fri, Dec 26, 2008 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ted Allen <tallen@blackducksoftware.com> writes:
>> 600mb measured by get_octet_length on data.  If there is a better way to measure the row/cell size, please let me
knowbecause we thought it was the >1Gb problem too.  We thought we were being conservative by getting rid of the larger
rowsbut I guess we need to get rid of even more. 
>
> Yeah, the average expansion of bytea data in COPY format is about 3X :-(
> So you need to get the max row length down to around 300mb.  I'm curious
> how you got the data in to start with --- were the values assembled on
> the server side?

Wouldn't binary style COPY be more forgiving in this regard?  (if so,
the OP might have better luck running COPY BINARY)...

This also goes for libpq traffic..large (>1mb) bytea definately want
to be passed using the binary switch in the protocol.

merlin

Re: Troubles dumping a very large table.

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On Fri, Dec 26, 2008 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, the average expansion of bytea data in COPY format is about 3X :-(
>> So you need to get the max row length down to around 300mb.  I'm curious
>> how you got the data in to start with --- were the values assembled on
>> the server side?

> Wouldn't binary style COPY be more forgiving in this regard?  (if so,
> the OP might have better luck running COPY BINARY)...

Yeah, if he's willing to use COPY BINARY directly.  AFAIR there is not
an option to get pg_dump to use it.  But maybe "pg_dump -s" together
with a manual dump of the table data is the right answer.  It probably
beats shoving some of the rows aside as he's doing now...

            regards, tom lane

Re: Troubles dumping a very large table.

От
Ted Allen
Дата:
I was hoping use pg_dump and not to have to do a manual dump but if that
latest solution (moving rows >300mb elsewhere and dealing with them
later) does not work I'll try that.

Thanks everyone.

Merlin Moncure wrote:
> On Fri, Dec 26, 2008 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Ted Allen <tallen@blackducksoftware.com> writes:
>>
>>> 600mb measured by get_octet_length on data.  If there is a better way to measure the row/cell size, please let me
knowbecause we thought it was the >1Gb problem too.  We thought we were being conservative by getting rid of the larger
rowsbut I guess we need to get rid of even more. 
>>>
>> Yeah, the average expansion of bytea data in COPY format is about 3X :-(
>> So you need to get the max row length down to around 300mb.  I'm curious
>> how you got the data in to start with --- were the values assembled on
>> the server side?
>>
>
> Wouldn't binary style COPY be more forgiving in this regard?  (if so,
> the OP might have better luck running COPY BINARY)...
>
> This also goes for libpq traffic..large (>1mb) bytea definately want
> to be passed using the binary switch in the protocol.
>
> merlin
>


Re: Troubles dumping a very large table.

От
Dimitri Fontaine
Дата:
Hi,

Le vendredi 26 décembre 2008, Tom Lane a écrit :
> Yeah, if he's willing to use COPY BINARY directly.  AFAIR there is not
> an option to get pg_dump to use it.

Would it be possible to consider such an additional switch to pg_dump?

Of course the DBA has to know when to use it safely, but if the plan is to be
able to restore later dump on the same machine to recover from some human
error (oops, forgot the WHERE clause to this DELETE statement), it seems it
would be a good idea.

Regards,
--
dim

Вложения