Обсуждение: How to get a more RSYNC compatible output of pg_dump?

Поиск
Список
Период
Сортировка

How to get a more RSYNC compatible output of pg_dump?

От
Thorsten Schöning
Дата:
Hi everyone,

for various historical reasons I maintain a database containing large
file uploads, which makes uncompressed output of pg_dump ~200 GiB in
size currently. I'm storing that dump to some NAS and am trying to
forward it from there using RSYNC to multiple different additional
offsite USB disks.

I'm doing the same with the files directory of Postgres already after
taking BTRFS snapshots etc. and for those files things work pretty
well with RSYNC. Lots of files are skipped entirely, some are slightly
updated in-place, some updates are a bit larger depending on the
actual changes and when RSYNC executed last etc.

Though, with the large dumps it seems to me that with every slight
change in the actual data the entire dump gets downloaded again. I'm
already using uncompressed dumps in the hope that the output is more
stable and RSYNC better able to recognize unchanged parts. But I guess
that most changes in the dumped data simply result in all subsequent
data being that misplaced compared to what RSYNC reads against, that
it's like downloading the whole file again in the end.

Is that simply the way it is or are there some optimizations possible
when using pg_dump? Am using Postgres 11 and don't see anything which
seems to help in this use-case.

Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

--
AM-SoFT IT-Service - Bitstore Hameln GmbH
Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Tel:   05151-  9468- 0
Tel:   05151-  9468-55
Fax:   05151-  9468-88
Mobil:  0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen jederzeit zur Verfügung.

Mit freundlichen Grüßen,

Thorsten Schöning


Telefon: +49 (0)515 94 68 - 0
Fax:
E-Mail: TSchoening@am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH
Brandenburger Straße 7c
31789 Hameln

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen und ist ausschliesslich für den
Adressatenbestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Wenn Sie
nichtder richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
undvernichten Sie diese E-Mail. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede
Veröffentlichung,Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Massnahmen im Vertrauen
auferlangte Information untersagt.  

This e-mail may contain confidential and/or privileged information and is intended solely for the addressee. Access to
thisemail by anyone else is unauthorized. If you are not the intended recipient (or have received this e-mail in error)
pleasenotify the sender immediately and destroy this e-mail. If you are not the intended recipient, any disclosure,
copying,distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.  

Hinweise zum Datenschutz: bitstore.group/datenschutz






Re: How to get a more RSYNC compatible output of pg_dump?

От
Holger Jakobs
Дата:
Am 16.05.22 um 09:56 schrieb Thorsten Schöning:
> Hi everyone,
>
> for various historical reasons I maintain a database containing large
> file uploads, which makes uncompressed output of pg_dump ~200 GiB in
> size currently. I'm storing that dump to some NAS and am trying to
> forward it from there using RSYNC to multiple different additional
> offsite USB disks.
>
> I'm doing the same with the files directory of Postgres already after
> taking BTRFS snapshots etc. and for those files things work pretty
> well with RSYNC. Lots of files are skipped entirely, some are slightly
> updated in-place, some updates are a bit larger depending on the
> actual changes and when RSYNC executed last etc.
>
> Though, with the large dumps it seems to me that with every slight
> change in the actual data the entire dump gets downloaded again. I'm
> already using uncompressed dumps in the hope that the output is more
> stable and RSYNC better able to recognize unchanged parts. But I guess
> that most changes in the dumped data simply result in all subsequent
> data being that misplaced compared to what RSYNC reads against, that
> it's like downloading the whole file again in the end.
>
> Is that simply the way it is or are there some optimizations possible
> when using pg_dump? Am using Postgres 11 and don't see anything which
> seems to help in this use-case.
>
> Thanks!
>
> Mit freundlichen Grüßen
>
> Thorsten Schöning
>
Hi Thorsten,

This is an rsync question, not a pg_dump question.

If you want to sync a new version of a file without transferring the 
whole thing, you have to use the option -c or --checksum.

This works well only if some blocks of the file have changed, while most 
others haven't. This won't be the case of a pg_dump.

So I don't see a way of re-syncing the way you expect it to.

Regards,

Holger


-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012


Вложения

Re: How to get a more RSYNC compatible output of pg_dump?

От
hubert depesz lubaczewski
Дата:
On Mon, May 16, 2022 at 09:56:34AM +0200, Thorsten Schöning wrote:
> Is that simply the way it is or are there some optimizations possible
> when using pg_dump? Am using Postgres 11 and don't see anything which
> seems to help in this use-case.

What I'd suggest:
1. Use -Fd, to have each table separately
2. Use -Z0 to avoid normal compression
3. Use `gzip --rsyncable` on the data, to make the backup smaller for
   transmit (though you will need to unpack it to restore), or play with
   other compressors, like xz

depesz



Re: How to get a more RSYNC compatible output of pg_dump?

От
Thorsten Schöning
Дата:
Guten Tag Holger Jakobs,
am Montag, 16. Mai 2022 um 12:52 schrieben Sie:

> If you want to sync a new version of a file without transferring
> the whole thing, you have to use the option -c or --checksum.

No, --checksum is only for (slowly) recognizing changes at all,
transfer is differential by default with or without that option.

>  -c, --checksum              skip based on checksum, not mod-time & size

https://linux.die.net/man/1/rsync

> This works well only if some blocks of the file have changed, while
> most others haven't. This won't be the case of a pg_dump.

It is the case for some dumps I'm creating, but those are a lot
smaller and don't store large objects like files. Therefore I was
wondering about the order in which pg_dump outputs tables, rows etc.,
e.g. if that is the same always or changing with each exec for some
reason.

Mit freundlichen Grüßen

Thorsten Schöning

--
AM-SoFT IT-Service - Bitstore Hameln GmbH
Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Tel:   05151-  9468- 0
Tel:   05151-  9468-55
Fax:   05151-  9468-88
Mobil:  0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen jederzeit zur Verfügung.

Mit freundlichen Grüßen,

Thorsten Schöning


Telefon: +49 (0)515 94 68 - 0
Fax:
E-Mail: TSchoening@am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH
Brandenburger Straße 7c
31789 Hameln

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen und ist ausschliesslich für den
Adressatenbestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Wenn Sie
nichtder richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
undvernichten Sie diese E-Mail. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede
Veröffentlichung,Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Massnahmen im Vertrauen
auferlangte Information untersagt.  

This e-mail may contain confidential and/or privileged information and is intended solely for the addressee. Access to
thisemail by anyone else is unauthorized. If you are not the intended recipient (or have received this e-mail in error)
pleasenotify the sender immediately and destroy this e-mail. If you are not the intended recipient, any disclosure,
copying,distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.  

Hinweise zum Datenschutz: bitstore.group/datenschutz






Re: How to get a more RSYNC compatible output of pg_dump?

От
Holger Jakobs
Дата:
Am 16.05.22 um 14:50 schrieb Thorsten Schöning:
> Guten Tag Holger Jakobs,
> am Montag, 16. Mai 2022 um 12:52 schrieben Sie:
>
>> If you want to sync a new version of a file without transferring
>> the whole thing, you have to use the option -c or --checksum.
> No, --checksum is only for (slowly) recognizing changes at all,
> transfer is differential by default with or without that option.
Yes, the option causes rsync to also detect changes if the size and 
timestamps haven't changed.
>>   -c, --checksum              skip based on checksum, not mod-time & size
> https://linux.die.net/man/1/rsync
>
>> This works well only if some blocks of the file have changed, while
>> most others haven't. This won't be the case of a pg_dump.
> It is the case for some dumps I'm creating, but those are a lot
> smaller and don't store large objects like files. Therefore I was
> wondering about the order in which pg_dump outputs tables, rows etc.,
> e.g. if that is the same always or changing with each exec for some
> reason.
>
If only one byte close to the beginning of the file is added, this would 
shift all following bytes by 1, so no block of the file will be 
identical and have to be re-transferred.

Therfore, the only option is creating separate files for each table. 
This would transfer only those files where the content of the table has 
been changed since the last transmission. There has been another answer 
by depesz suggesting that.

Regards,

Holger

-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012


Вложения

Re: How to get a more RSYNC compatible output of pg_dump?

От
Thorsten Schöning
Дата:
Guten Tag hubert depesz lubaczewski,
am Montag, 16. Mai 2022 um 14:28 schrieben Sie:

> 1. Use -Fd, to have each table separately

Good hint, but doesn't work (easily) in my context. I'm streaming
dumps through SSH to store as plain file or in some contexts even into
a named pipe backed up by BorgBackup/BorgMatic. Though, with
BorgBackup de-duplication seems to work pretty well, so I might focus
on RSYNC.

Streaming through SSH using TAR-format and UNTAR locally afterwards
with some additional DIR-handling might be an option... Need to think
about this, thanks!

Mit freundlichen Grüßen

Thorsten Schöning

--
AM-SoFT IT-Service - Bitstore Hameln GmbH
Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Tel:   05151-  9468- 0
Tel:   05151-  9468-55
Fax:   05151-  9468-88
Mobil:  0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen jederzeit zur Verfügung.

Mit freundlichen Grüßen,

Thorsten Schöning


Telefon: +49 (0)515 94 68 - 0
Fax:
E-Mail: TSchoening@am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH
Brandenburger Straße 7c
31789 Hameln

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen und ist ausschliesslich für den
Adressatenbestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Wenn Sie
nichtder richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
undvernichten Sie diese E-Mail. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede
Veröffentlichung,Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Massnahmen im Vertrauen
auferlangte Information untersagt.  

This e-mail may contain confidential and/or privileged information and is intended solely for the addressee. Access to
thisemail by anyone else is unauthorized. If you are not the intended recipient (or have received this e-mail in error)
pleasenotify the sender immediately and destroy this e-mail. If you are not the intended recipient, any disclosure,
copying,distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.  

Hinweise zum Datenschutz: bitstore.group/datenschutz






Re: How to get a more RSYNC compatible output of pg_dump?

От
Thorsten Schöning
Дата:
Guten Tag Thorsten Schöning,
am Montag, 16. Mai 2022 um 18:48 schrieben Sie:

> Streaming through SSH using TAR-format and UNTAR locally afterwards
> with some additional DIR-handling might be an option... Need to think
> about this, thanks!

Things seem to be pretty easy in the end and by piping one is able to
avoid additional local copies as well:

> mkdir -p [...]
> pushd    [...]
> ssh      [...] | tar --extract --no-seek --overwrite
> popd

> pg_restore --dbname=test --format=d DB_DIR

The only downside of this approach seems to be lots of files in one
directory and am not sure currently if extraneous files are correctly
deleted. But better having too many than too few files... ;-)

Mit freundlichen Grüßen

Thorsten Schöning

--
AM-SoFT IT-Service - Bitstore Hameln GmbH
Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Tel:   05151-  9468- 0
Tel:   05151-  9468-55
Fax:   05151-  9468-88
Mobil:  0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 221853 - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen jederzeit zur Verfügung.

Mit freundlichen Grüßen,

Thorsten Schöning


Telefon: +49 (0)515 94 68 - 0
Fax:
E-Mail: TSchoening@am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH
Brandenburger Straße 7c
31789 Hameln

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen und ist ausschliesslich für den
Adressatenbestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Wenn Sie
nichtder richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
undvernichten Sie diese E-Mail. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede
Veröffentlichung,Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Massnahmen im Vertrauen
auferlangte Information untersagt.  

This e-mail may contain confidential and/or privileged information and is intended solely for the addressee. Access to
thisemail by anyone else is unauthorized. If you are not the intended recipient (or have received this e-mail in error)
pleasenotify the sender immediately and destroy this e-mail. If you are not the intended recipient, any disclosure,
copying,distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.  

Hinweise zum Datenschutz: bitstore.group/datenschutz






Re: How to get a more RSYNC compatible output of pg_dump?

От
Jeff Janes
Дата:


Though, with the large dumps it seems to me that with every slight
change in the actual data the entire dump gets downloaded again. I'm
already using uncompressed dumps in the hope that the output is more
stable and RSYNC better able to recognize unchanged parts. But I guess
that most changes in the dumped data simply result in all subsequent
data being that misplaced compared to what RSYNC reads against, that
it's like downloading the whole file again in the end.

For me the rsync differential algorithm on dump files just works with no special preparations.  I think you need to investigate this from the rsync side, not PostgreSQL.  For example, if you change a dump file just by adding or deleting one character near the beginning, how does rsync respond to that? 

Cheers,

Jeff