Re: pg_dump / copy bugs with "big lines" ?

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: pg_dump / copy bugs with "big lines" ?
Дата
Msg-id 5519E025.3030704@BlueTreble.com
обсуждение исходный текст
Ответ на pg_dump / copy bugs with "big lines" ?  (Ronan Dunklau <ronan.dunklau@dalibo.com>)
Ответы Re: pg_dump / copy bugs with "big lines" ?  (Ronan Dunklau <ronan.dunklau@dalibo.com>)
Список pgsql-hackers
On 3/30/15 5:46 AM, Ronan Dunklau wrote:
> Hello hackers,
>
> I've tried my luck on pgsql-bugs before, with no success, so I report these
> problem here.
>
> The documentation mentions the following limits for sizes:
>
> Maximum Field Size      1 GB
> Maximum Row Size        1.6 TB
>
> However, it seems like rows bigger than 1GB can't be COPYed out:
>
> ro=# create table test_text (c1 text, c2 text);
> CREATE TABLE
> ro=# insert into test_text (c1) VALUES (repeat('a', 536870912));
> INSERT 0 1
> ro=# update test_text set c2 = c1;
> UPDATE 1
>
> Then, trying to dump or copy that results in the following error:
>
> ro=# COPY test_text TO '/tmp/test';
> ERROR:  out of memory
> DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912
> more bytes.
>
> In fact, the same thing happens when using a simple SELECT:
>
> ro=# select * from test_text ;
> ERROR:  out of memory
> DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912
> more bytes.
>
> In the case of COPY, the server uses a StringInfo to output the row. The
> problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row
> should be able to hold much more than that.

Yeah, shoving a whole row into one StringInfo is ultimately going to 
limit a row to 1G, which is a far cry from what the docs claim. There's 
also going to be problems with FE/BE communications, because things like 
pq_sendbyte all use StringInfo as a buffer too. So while Postgres can 
store a 1.6TB row, you're going to find a bunch of stuff that doesn't 
work past around 1GB.

> So, is this a bug ? Or is there a caveat I would have missed in the
> documentation ?

I suppose that really depends on your point of view. The real question 
is whether we think it's worth fixing, or a good idea to change the 
behavior of StringInfo.

StringInfo uses int's to store length, so it could possibly be changed, 
but then you'd just error out due to MaxAllocSize.

Now perhaps those could both be relaxed, but certainly not to the extent 
that you can shove an entire 1.6TB row into an output buffer.

The other issue is that there's a LOT of places in code that blindly 
copy detoasted data around, so while we technically support 1GB toasted 
values you're probably going to be quite unhappy with performance. I'm 
actually surprised you haven't already seen this with 500MB objects.

So long story short, I'm not sure how worthwhile it would be to try and 
fix this. We probably should improve the docs though.

Have you looked at using large objects for what you're doing? (Note that 
those have their own set of challenges and limitations.)

> We also hit a second issue, this time related to bytea encoding.

There's probably several other places this type of thing could be a 
problem. I'm thinking of conversions in particular.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Streaming replication
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: Streaming replication