Обсуждение: Unexpected trouble from pg_basebackup

Поиск

Список

Период

Сортировка

Unexpected trouble from pg_basebackup

От

otheus uibk

Дата:

04 октября 2016 г., 23:28:39

I recently updated my systems from pg 9.1.8 to 9.5.3. A pg_dumpall was used to migrate the data. Now I'm trying to re-establish replication between master and slave. I'm getting stuck.

When I run pg_basebackup (via a script which worked flawlessly on 9.1.8, AND via command line, ala "manual mode"), it creates the PGDATA directory structure, some files, but soon gets stuck. It writes fewer than 40 MB of a 20GB database. Further, using tcpdump to watch packets on the wire, I confirm no data is going across.

However, it's clear a connection has been made and the process was started. On the master, I see the process table:

postgres: wal sender process pgsync A.B.C.D(42821) sending backup

Meanwhile, on the client, I see no output updates. Here's the command:

pg_basebackup -x -P -v -D $PGDATA -w

PGUSER, PGPASSWORD and PGHOST are set accordingly. The User is a user with the REPLICATION attribute. Just in case, I dropped and re-created the user.

So that's question A.

Question B is related. In attempting to verify that the permissions and HBA were set correctly, I attempted to do a pg_dump using the same PGUSER and PGHOST. What I got surprised me:

$ pg_dump -s onyxeditor

pg_dump: [archiver (db)] query failed: ERROR: permission denied for relation licence

pg_dump: [archiver (db)] query was: LOCK TABLE public.licence IN ACCESS SHARE MODE

This is the replication user. Other schemas worked just fine, but not this one. Is this expected behavior?

Kind regards,

Otheus

otheus.uibk@gmail.com

otheus.shelling@uibk.ac.at

Re: Unexpected trouble from pg_basebackup

От

otheus uibk

Дата:

04 октября 2016 г., 23:43:24

After a 3 to 4 minute delay, pg_basebackup started doing it's thing and finished within a few minutes. So now the question is: why the startup delay?

Re: Unexpected trouble from pg_basebackup

От

Magnus Hagander

Дата:

04 октября 2016 г., 23:49:50

On Tue, Oct 4, 2016 at 10:42 PM, otheus uibk <otheus.uibk@gmail.com> wrote:

After a 3 to 4 minute delay, pg_basebackup started doing it's thing and finished within a few minutes. So now the question is: why the startup delay?

Sounds to me like it's doing a CHECKPOINT with spreading, which make it take time. Try with "-c fast" and see if the problem goes away.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Unexpected trouble from pg_basebackup

От

Magnus Hagander

Дата:

05 октября 2016 г., 21:34:59

On Wed, Oct 5, 2016 at 3:55 PM, otheus uibk <otheus.uibk@gmail.com> wrote:

On Tue, Oct 4, 2016 at 10:49 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Oct 4, 2016 at 10:42 PM, otheus uibk <otheus.uibk@gmail.com> wrote:
After a 3 to 4 minute delay, pg_basebackup started doing it's thing and finished within a few minutes. So now the question is: why the startup delay?

Sounds to me like it's doing a CHECKPOINT with spreading, which make it take time. Try with "-c fast" and see if the problem goes away.

Maybe not too far off.

2016-10-04 17:43:40.620 GMT         57eb90a0.6e07 402 00000 LOG: checkpoint complete: wrote 12799 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 5 recycled; write=1389.348 s, sync=0.033 s, total=1389.400 s; sync files=240, longest=0.003 s, average=0.000 s; distance=92915 kB, estimate=129373 kB
2016-10-04 18:20:31.714 GMT         57eb90a0.6e07 403 00000 LOG: checkpoint starting: time
2016-10-04 18:39:19.870 GMT         57eb90a0.6e07 404 00000 LOG: checkpoint complete: wrote 10265 buffers (0.8%); 0 transaction log file(s) added, 0 removed, 4 recycled; write=1128.118 s, sync=0.023 s, total=1128.155 s; sync files=190, longest=0.002 s, average=0.000 s; distance=73419 kB, estimate=123778 kB
2016-10-04 19:07:12.647 GMT [unknown] pgsync 57f3fde0.52e3 2 00000 LOG: replication connection authorized: user=pgsync
2016-10-04 19:07:12.703 GMT         57eb90a0.6e07 405 00000 LOG: checkpoint starting: force wait
2016-10-04 19:20:32.879 GMT [unknown]     pgsync 57f40100.5891 2 00000 LOG: replication connection authorized: user=pgsync
2016-10-04 19:23:05.249 GMT         57eb90a0.6e07 406 00000 LOG: checkpoint complete: wrote 8638 buffers (0.7%); 0 transaction log file(s) added, 0 removed, 5 recycled; write=952.514 s, sync=0.016 s, total=952.546 s; sync files=238, longest=0.002 s, average=0.000 s; distance=68257 kB, estimate=118226 kB
2016-10-04 19:23:05.249 GMT         57eb90a0.6e07 407 00000 LOG: checkpoint starting: force wait
2016-10-04 19:28:52.232 GMT         57eb90a0.6e07 408 00000 LOG: checkpoint complete: wrote 3102 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=346.957 s, sync=0.018 s, total=346.982 s; sync files=149, longest=0.002 s, average=0.000 s; distance=36016 kB, estimate=110005 kB
2016-10-04 19:30:31.922 GMT [unknown] pgsync 57f40357.5c70 2 00000 LOG: replication connection authorized: user=pgsync
2016-10-04 19:30:31.932 GMT              57eb90a0.6e07 409 00000 LOG: checkpoint starting: force wait
2016-10-04 19:33:40.857 GMT         57eb90a0.6e07 410 00000 LOG: checkpoint complete: wrote 1763 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=188.886 s, sync=0.030 s, total=188.924 s; sync files=125, longest=0.004 s, average=0.000 s; distance=13135 kB, estimate=100318 kB

OK, so what was happening is that the checkpoints were taking 5 to 15 minutes minutes, and aborted basebackups were triggering new checkpoints which waited on the previous ones.

Is it possible the new environment is not high-performance enough??

possibly relevant configuration change:
checkpoint_timeout=1h

pg_basebackup will always send in a checkpoint. PostgreSQL will time that to take approximately checkpoint_timeout * checkpoint_completion_target time. So with checkpoint_timeout set to a very high value, it will by default target something like 30 minutes before it even gets started. The only reason it takes as *little* as it does is that your system is lightly loaded.

You can do a fast checkpoint with the parameter I suggested before, or you can tune your checkpoint_timeout to be something that gives you a more reasonable time.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Unexpected trouble from pg_basebackup

Unexpected trouble from pg_basebackup

Re: Unexpected trouble from pg_basebackup

Re: Unexpected trouble from pg_basebackup

Re: Unexpected trouble from pg_basebackup