Re: Shortest offline window on database migration

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Shortest offline window on database migration
Дата
Msg-id 20190604201835.GV2480@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Shortest offline window on database migration  (Haroldo Kerry <hkerry@callix.com.br>)
Список pgsql-performance
Greetings,

* Haroldo Kerry (hkerry@callix.com.br) wrote:
> The bottleneck at dump is CPU (a single one, on a 44 thread server), as we
> are using the -Fc option, that does not allow multiple jobs.
> We tried some time ago to use the --jobs option of pg_dump but it was
> slower, even with more threads. Our guess is the sheer volume of files
> outweighs the processing gains of using a compressed file output. Also
> pg_dump even with multiple jobs spends a lot of time (1h+) on the "reading
> dependency data" section that seems to be single threaded (our application
> is multi-tenant and we use schemas to separate tenant data, hence we have a
> lot of tables).

You might want to reconsider using the separate-schemas-for-tenants
approach.  This isn't the only annoyance you can run into with lots and
lots of tables.  That said, are you using the newer version of pg_dump
(which is what you should be doing when migrating to a newer version of
PG, always)?  We've improved it over time, though I can't recall off-hand
if this particular issue was improved of in-between the releases being
discussed here.  Of course, lots of little files and dealing with them
could drag down performance when working in parallel.  Still a bit
surprised that it's ending up slower than -Fc.

> We are creating the replica using :
> docker exec pg-2 pg_basebackup -h 192.168.0.107 -U replication -P --xlog -D
> /var/lib/postgresql/data_9.6
> and it is taking 1h10m , instead of the 2h I reported initially, because we
> were using rsync with checksums to do it, after experimenting with
> pg_basebackup we found out it is faster, rsync was taking 1h just to
> calculate all checksums. Thanks for your insight on this taking too long.

So, it's a bit awkward still, unfortunately, but you can use pgbackrest
to effectively give you a parallel-replica-build.  The steps are
something like:

Get pgbackrest WAL archiving up and going, with the repo on the
destination server/filesystem, but have 'compress=n' in the
pgbackrest.conf for the repo.

Run: pgbackrest --stanza=mydb --type=full --process-max=8 backup

Once that's done, just do:

mv /path/to/repo/backup/mydb/20190605-120000F/pg_data /new/pgdata
chmod -R g-rwx /new/pgdata

Then in /new/pgdata, create a recovery.conf file like:

restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"'

And start up the DB server.

We have some ideas about how make that whole thing cleaner but the
rewrite into C has delayed our efforts, perhaps once that's done (this
fall), we can look at it.

Of course, you won't have an actual backup of the new database server at
that point yet, so you'll want to clean things up and make that happen
ASAP.  Another option, which is what I usually recommend, is just to
take a new backup (properly) and then do a restore from it, but that'll
obviously take longer since there's two copies being done instead of one
(though you can parallelize to your heart's content, so it can still be
quite fast if you have enough CPU and I/O).

Thanks,

Stephen

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Vasilis Ventirozos
Дата:
Сообщение: Strange query behaviour between 9.4 and 12beta1
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Strange query behaviour between 9.4 and 12beta1