Обсуждение: pg_dump Running Slow

Поиск
Список
Период
Сортировка

pg_dump Running Slow

От
Yongye Serkfem
Дата:
Hello Engineers,
I trust you are all doing well. I need help on how to improve the speed of pg_dump. I took a pg_dump on a 1TB database, which took almost a whole day. I used this command: "pg_dump -U postgres -d dynamic -f /backups/." Also, how do I check on the progression of the process? 

Sincerely your
Yongye Serkfem

Re: pg_dump Running Slow

От
Adrian Klaver
Дата:


On 1/16/24 10:55 AM, Yongye Serkfem wrote:
Hello Engineers,
I trust you are all doing well. I need help on how to improve the speed of pg_dump. I took a pg_dump on a 1TB database, which took almost a whole day. I used this command: "pg_dump -U postgres -d dynamic -f /backups/." Also, how do I check on the progression of the process?


https://www.postgresql.org/docs/current/app-pgdump.html

"
j njobs
--jobs=njobs

Run the dump in parallel by dumping njobs tables simultaneously. This option may reduce the time needed to perform the dump but it also increases the load on the database server. You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.

pg_dump will open njobs + 1 connections to the database, so make sure your max_connections setting is high enough to accommodate all connections.

<...>

"


"

-v
--verbose

Specifies verbose mode. This will cause pg_dump to output detailed object comments and start/stop times to the dump file, and progress messages to standard error. Repeating the option causes additional debug-level messages to appear on standard error.

"


Sincerely your
Yongye Serkfem

Re: pg_dump Running Slow

От
Jim Nasby
Дата:
On 1/16/24 12:55 PM, Yongye Serkfem wrote:
> Hello Engineers,
> I trust you are all doing well. I need help on how to improve the speed 
> of pg_dump. I took a pg_dump on a 1TB database, which took almost a 
> whole day. I used this command: "pg_dump -U postgres -d dynamic -f 
> /backups/." Also, how do I check on the progression of the process?

The advantage to pg_dump is that it's very simple and you'd have to work 
really hard to end up with a backup that won't restore. As such, I 
*always* recommend that it be *part* of your disaster recovery plan. It 
also directly supports only restoring one table.

The flip-side is that all you get from pg_dump is (in one form or 
another) just a bunch of SQL that you'll then have to execute to 
restore. That process can be quite slow for anything but a very small 
cluster. (And as you've seen, pg_dump itself can be pretty slow.) While 
--jobs allows parallelizing that work, it's still a very expensive process.

For clusters of any significant size you'll also want to look at some 
form of binary backup (aka: PITR) [1]. Since that operates directly at 
the OS level it's much faster than pg_dump on large clusters. The 
downside is there are lots of subtle ways to set it up incorrectly, 
resulting in backups that won't restore. Instead of trying to manage it 
by hand, I strongly recommend using a tool that's meant for managing 
Postgres binary backups. pgBackRest[2] and barman[3] are two popular 
choices for that.

Most importantly: ALWAYS TEST YOUR BACKUPS. As I mentioned, it's pretty 
hard to screw up pg_dump, but you don't want to be trying to figure out 
how to restore in the middle of a disaster. Given all the subtlties 
involved with PITR I would actually recommend you test restoring *every* 
base backup you take, especially if you're not using one of the tools to 
manage it.

BTW, one of the biggest advantages to those tools is that a lot of 
people use them, so any bugs (in the tool or in Postgres itself) are 
more likely to be found and eliminated. It's also a big advantage for 
cloud providers (as well as tools like PGO[4]): with many thousands of 
instances running regular backups it's harder for bugs to remain hidden.

1: https://www.postgresql.org/docs/current/continuous-archiving.html
2: https://pgbackrest.org/
3: https://pgbarman.org/
4: https://github.com/CrunchyData/postgres-operator
-- 
Jim Nasby, Data Architect, Austin TX