Hello.
The documentation mentions the following limits for sizes:
Maximum Field Size 1 GB
Maximum Row Size 1.6 TB
However, it seems like rows bigger than 1GB can't be COPYed out:
ro=3D# create table test_text (c1 text, c2 text);
CREATE TABLE
ro=3D# insert into test_text (c1) VALUES (repeat('a', 536870912));
INSERT 0 1
ro=3D# update test_text set c2 =3D c1;
UPDATE 1
Then, trying to dump or copy that results in the following error:
ro=3D# COPY test_text TO '/tmp/test';
ERROR: out of memory
D=C9TAIL : Cannot enlarge string buffer containing 536870913 bytes by 5=
36870912=20
more bytes.
In fact, the same thing happens when using a dumb SELECT:
ro=3D# select * from test_text ;
ERROR: out of memory
D=C9TAIL : Cannot enlarge string buffer containing 536870922 bytes by 5=
36870912=20
more bytes.
In the case of COPY, the server uses a StringInfo to output the row. Th=
e=20
problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a r=
ow=20
should be able to hold much more than that.
So, is this a bug ? Or is there a caveat I would have missed in the=20
documentation ?
We also hit a second issue, this time related to bytea encoding.
This test case is a bit more complicated, since I had to use an externa=
l=20
(client) program to insert my data. It involves inserting a string that=
fit=20
into 1GB when encoded in escape format, but is larger than that in hex,=
and=20
another string which fits in 1GB using the hex format, but is larger th=
an that=20
in escape:
from=20psycopg2 import connect
from=20io import BytesIO
conn =3D connect(dbname=3D"ro")
cur =3D conn.cursor()
fullcontent =3D BytesIO()
# Write a binary string that weight less
# than 1 GB when escape encoded, but more than
# that if hex encoded
for i in range(200):
fullcontent.write(b"aaa" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")
fullcontent.seek(0)
fullcontent.truncate()
# Write another binary string that weight
# less than 1GB when hex encoded, but more than
# that if escape encoded
cur.execute("SET bytea_output =3D 'hex'")
fullcontent.write(b"\\\\x")
for i in range(300):
fullcontent.write(b"00" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")
cur.execute("COMMIT;")
cur.close()
I couldn't find an invocation of pg_dump which would allow me to dump b=
oth=20
lines:
ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Descape" pg_dump =
=2DFc >=20
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult=
()=20
failed.
pg_dump: Error message from server: ERROR: invalid memory alloc reques=
t size=20
1200000001
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;
ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Dhex" pg_dump -Fc=
>=20
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult=
()=20
failed.
pg_dump: Error message from server: ERROR: invalid memory alloc reques=
t size=20
1200000003
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;
Using a COPY with binary format works:
ro=3D# COPY test_bytea TO '/tmp/test' WITH BINARY;
I'm really surprised by those results: did I do anything wrong ? If it =
is=20
indeed a bug, I'm surprised it wasn't discovered before.
Thank you.
=2D-=20
Ronan Dunklau
http://dalibo.com - http://dalibo.org