Обсуждение: Improving pg_dump performance when handling large numbers of LOBs

Поиск
Список
Период
Сортировка

Improving pg_dump performance when handling large numbers of LOBs

От
Wyatt Tellis
Дата:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB. When using pg_dump we've found that it takes a couple of weeks to dump out this much data.  We've tried using the jobs option with the directory format but that seems to save each LOB separately which makes moving the resulting dump to another location unwieldy.  Has anyone else had to deal with dumping a database with these many LOBs?  Are there any suggestions for how to improve performance?

Thanks,

Wyatt

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Ron Johnson
Дата:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Adrian Klaver
Дата:


On 2/5/24 11:35 AM, Ron Johnson wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?


I think it less about the size of the individual objects then the number (150 million) of them.

AFAIK that can't be handled by COPY, therefore they have to be transferred individually.

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Wyatt Tellis
Дата:
Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Wyatt

On Mon, Feb 5, 2024 at 11:36 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Andreas Joseph Krogh
Дата:
På mandag 05. februar 2024 kl. 20:55:43, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Well, the data is there nonetheless, is it an option to convert it to bytea before migration?

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Wyatt Tellis
Дата:
No, we don't have the ability to make schema changes and the schema in the PG15 copy needs to match what's in the PG 12 versions

Wyatt

On Mon, Feb 5, 2024 at 12:05 PM Andreas Joseph Krogh <andreas@visena.com> wrote:
På mandag 05. februar 2024 kl. 20:55:43, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Well, the data is there nonetheless, is it an option to convert it to bytea before migration?

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Ron Johnson
Дата:
Just checking... and I sympathize with your situation.

On Mon, Feb 5, 2024 at 2:56 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Wyatt

On Mon, Feb 5, 2024 at 11:36 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Andreas Joseph Krogh
Дата:
På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

No, we don't have the ability to make schema changes and the schema in the PG15 copy needs to match what's in the PG 12 versions

Well then, I guess it boils down to how badly the ones in charge wants this migration…

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: Improving pg_dump performance when handling large numbers of LOBs

От
Ken Marshall
Дата:
On Mon, Feb 05, 2024 at 09:17:53PM +0100, Andreas Joseph Krogh wrote:
> 
> På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <
> wyatt.tellis@gmail.com <mailto:wyatt.tellis@gmail.com>>:
> 
> No, we don't have the ability to make schema changes and the schema in the 
> PG15 copy needs to match what's in the PG 12 versions
> 
> Well then, I guess it boils down to how badly the ones in charge wants this 
> migration…
> 
> --
> Andreas Joseph Krogh
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas@visena.com <mailto:andreas@visena.com>
> www.visena.com <https://www.visena.com>
>  <https://www.visena.com>

Hi,

Can you use logical replication? Then you aren't under the gun to make
a fast copy.

Regards,
Ken



Re: Improving pg_dump performance when handling large numbers of LOBs

От
Tom Lane
Дата:
Wyatt Tellis <wyatt.tellis@gmail.com> writes:
> Yes, the LOBs themselves are tiny, but there are a lot of them (~150
> million) which seem to be slowing down pg_dump.  Note, we did not
> design/build this system and agree that use of LOBs for this purpose was
> not necessary.

I don't know of anything much you can do at the user level.  But there
is a patchset under development to improve pg_dump's behavior with
tons of blobs:

https://www.postgresql.org/message-id/842242.1706287466@sss.pgh.pa.us

That's intended for v17, and I'm not sure how tough it might be to
adapt to v15 pg_dump, but maybe you could make that happen.  In any
case, more eyeballs reviewing that patchset would be welcome.

            regards, tom lane



Re: Improving pg_dump performance when handling large numbers of LOBs

От
Adrian Klaver
Дата:
On 2/5/24 12:32, Ken Marshall wrote:
> On Mon, Feb 05, 2024 at 09:17:53PM +0100, Andreas Joseph Krogh wrote:
>>
>> På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <
>> wyatt.tellis@gmail.com <mailto:wyatt.tellis@gmail.com>>:
>>
>> No, we don't have the ability to make schema changes and the schema in the
>> PG15 copy needs to match what's in the PG 12 versions
>>
>> Well then, I guess it boils down to how badly the ones in charge wants this
>> migration…
>>
>> --
>> Andreas Joseph Krogh
>> CTO / Partner - Visena AS
>> Mobile: +47 909 56 963
>> andreas@visena.com <mailto:andreas@visena.com>
>> www.visena.com <https://www.visena.com>
>>   <https://www.visena.com>
> 
> Hi,
> 
> Can you use logical replication? Then you aren't under the gun to make
> a fast copy.

https://www.postgresql.org/docs/current/logical-replication-restrictions.html

"Large objects (see Chapter 35) are not replicated. There is no 
workaround for that, other than storing data in normal tables."


> 
> Regards,
> Ken
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Improving pg_dump performance when handling large numbers of LOBs

От
Shaheed Haque
Дата:
Might it be worth a modest amount of time using some basic profiling to see where the time is going? A week is a looonnngg time, even for 150e6 operations. For example, if there an unexpectedly high IO load, some temporary M.2 storage might help? 

On Tue, 6 Feb 2024, 01:36 Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?