Обсуждение: BUG #14384: pg_dump uses excessive amounts of memory for LOBs

Поиск
Список
Период
Сортировка

BUG #14384: pg_dump uses excessive amounts of memory for LOBs

От
boleslaw.ziobrowski@yahoo.pl
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDM4NApMb2dnZWQgYnk6ICAg
ICAgICAgIEJvbGVrIFppb2Jyb3dza2kKRW1haWwgYWRkcmVzczogICAgICBi
b2xlc2xhdy56aW9icm93c2tpQHlhaG9vLnBsClBvc3RncmVTUUwgdmVyc2lv
bjogOS41LjQKT3BlcmF0aW5nIHN5c3RlbTogICBVYnVudHUgMTQuMDQuNSBM
VFMgCkRlc2NyaXB0aW9uOiAgICAgICAgCgpwZ19kdW1wIHNlZW1zIHRvIGFs
bG9jYXRlIG1lbW9yeSBwcm9wb3J0aW9uYWwgdG8gdGhlIG51bWJlciBvZiBy
b3dzIGluCnBnX2xhcmdlb2JqZWN0IChub3QgbmVjZXNzYXJpbHkgY29ycmVs
YXRlZCB3aXRoIHNpemUgb2YgdGhlc2Ugb2JqZWN0cykgLAplLmcuIDQgR0Ig
Zm9yIGEgZmV3IG1pbGxpb25zIGFuZCBhYm91dCAyNiBHQiBmb3IgMzAgbWls
bGlvbnMgb2YgbG9icyAuICANCg0KU3RlcHMgdG8gcmVwcm9kdWNlIDoNCnBz
cWwgOg0KDQpjcmVhdGUgdGFibGUgbGFyZ2Vfb2JqZWN0X3Rlc3QoIGEgaW50
LCBsb2JqIG9pZCApOw0KDQppbnNlcnQgaW50byBsYXJnZV9vYmplY3RfdGVz
dCBzZWxlY3QgYS5pLGxvX2Zyb21fYnl0ZWEoMCwKRSdcXHhmZmZmZmYwMDAw
MDAwMDAwJykgZnJvbSBnZW5lcmF0ZV9zZXJpZXMoMSw1MDAwMDAwKSBhcyBh
KGkpIDsNCg0KY2xpOg0KDQpwZ19kdW1wIHBvc3RncmVzID4gL3RtcC9kdW1w
LnNxbA0KDQp0b3AgLW8gIiVNRU0iDQoKCg==

Re: BUG #14384: pg_dump uses excessive amounts of memory for LOBs

От
Tom Lane
Дата:
boleslaw.ziobrowski@yahoo.pl writes:
> pg_dump seems to allocate memory proportional to the number of rows in
> pg_largeobject (not necessarily correlated with size of these objects) ,

Yes, it does.  It also allocates memory proportional to the number of,
eg, tables, or any other DB object for that matter.

This is a consequence of the fact that blobs grew owners and privileges
in 9.0.  pg_dump uses its usual per-object infrastructure to keep track
of that.  The argument was that this'd be okay because if your large
objects are, well, large, then there couldn't be so many of them that
the space consumption would be fatal.  I had doubts about that at the
time, but I think we're more or less locked into it now.  It would
take a lot of restructuring to change it, and we'd lose functionality
too, because we couldn't have a separate TOC entry per blob.  That
means no ability to select out individual blobs during pg_restore.

TL;DR: blobs are not exactly lightweight objects.  If you want something
with less overhead, maybe you should just store the data in a plain
bytea column.

            regards, tom lane

Re: BUG #14384: pg_dump uses excessive amounts of memory for LOBs

От
Bolek Ziobrowski
Дата:
Tom Lane wrote:
>
> TL;DR: blobs are not exactly lightweight objects.  If you want something
> with less overhead, maybe you should just store the data in a plain
> bytea column.
>

Yes, I tested this case with bytea/TOAST in place of LOB column and
pg_dump used just a few MB .

Normally (when dumping a database with hundreds of tables) it uses
insignificant amount of memory so I was surprised when it allocated
about 26GB while processing a 100GB database with 30*10^6 LOBs (caused
swapping and almost triggered OOM killer).

The 'obvious' (for database this size) and easy solution was to use
pg_basebackup - and it sped up the process almost 40 times .

I found a similar case here
(https://www.postgresql.org/message-id/524C3163.1050502%40iqbuzz.ru) so
I think it would be good to warn users about this behavior in pg_dump
documentation (to save a few hours of the third person who triggers this
issue in ... October 2019).

Thank you,
Bolek Ziobrowski