Обсуждение: pg_basebackup and replication slots

Поиск
Список
Период
Сортировка

pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
I wonder why pg_basebackup doesn't have any support for replication slots.

When relying on replication slots to hang on to WAL data, there is a gap
between when pg_basebackup finishes and streaming replication is started
where WAL data could be thrown away by the primary.

Looking at the code, the -X stream method could easily specify a
replication slot.  (Might be nice if it could also create it in the same
run.)



Re: pg_basebackup and replication slots

От
Heikki Linnakangas
Дата:
On 05/21/2015 03:42 PM, Peter Eisentraut wrote:
> I wonder why pg_basebackup doesn't have any support for replication slots.
>
> When relying on replication slots to hang on to WAL data, there is a gap
> between when pg_basebackup finishes and streaming replication is started
> where WAL data could be thrown away by the primary.
>
> Looking at the code, the -X stream method could easily specify a
> replication slot.  (Might be nice if it could also create it in the same
> run.)

Yeah, would be nice. The automatically-created slot should also be 
automatically removed if the backup is aborted for any reason, i.e. if 
the connection dies.

- Heikki




Re: pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
On 5/21/15 8:42 AM, Peter Eisentraut wrote:
> I wonder why pg_basebackup doesn't have any support for replication slots.
>
> When relying on replication slots to hang on to WAL data, there is a gap
> between when pg_basebackup finishes and streaming replication is started
> where WAL data could be thrown away by the primary.
>
> Looking at the code, the -X stream method could easily specify a
> replication slot.  (Might be nice if it could also create it in the same
> run.)

Here is a patch set for this.

(If you're looking at the patch and wondering why there is no code to
actually do anything with the replication slot, that's because the code
that does the WAL streaming is already aware of replication slots
because of the pg_receivexlog functionality that it also serves.  So all
that needs to be done is set replication_slot.)

See also the concurrent discussion on (optionally?) initializing
restart_lsn when the replication slot is created

(http://www.postgresql.org/message-id/flat/B8D538AC5587C84898B261FCB8C7D8A41FDE1017@ex10-mbx-36009.ant.amazon.com#B8D538AC5587C84898B261FCB8C7D8A41FDE1017@ex10-mbx-36009.ant.amazon.com),
which might have an impact on the details of this change.



Вложения

Re: pg_basebackup and replication slots

От
Michael Paquier
Дата:
On Wed, Jul 1, 2015 at 10:33 AM, Peter Eisentraut wrote:
> (If you're looking at the patch and wondering why there is no code to
> actually do anything with the replication slot, that's because the code
> that does the WAL streaming is already aware of replication slots
> because of the pg_receivexlog functionality that it also serves.  So all
> that needs to be done is set replication_slot.)

This is cool to see this patch taking shape.

+    my ($stdout, $stderr);
+    run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-' ],
'<', \$sql, '>', \$stdout, '2>', \$stderr or die;
+    chomp $stdout;
+    return $stdout;

Could it be possible to chomp and return $stderr as well here? It
seems useful to me to perform sanity checks after calling psql.
-- 
Michael



Re: pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
On 7/1/15 8:37 AM, Michael Paquier wrote:
> On Wed, Jul 1, 2015 at 10:33 AM, Peter Eisentraut wrote:
>> (If you're looking at the patch and wondering why there is no code to
>> actually do anything with the replication slot, that's because the code
>> that does the WAL streaming is already aware of replication slots
>> because of the pg_receivexlog functionality that it also serves.  So all
>> that needs to be done is set replication_slot.)
> 
> This is cool to see this patch taking shape.
> 
> +    my ($stdout, $stderr);
> +    run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-' ],
> '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
> +    chomp $stdout;
> +    return $stdout;
> 
> Could it be possible to chomp and return $stderr as well here? It
> seems useful to me to perform sanity checks after calling psql.

Sure, makes sense.




Re: pg_basebackup and replication slots

От
Michael Paquier
Дата:
On Wed, Jul 1, 2015 at 11:35 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 7/1/15 8:37 AM, Michael Paquier wrote:
>> On Wed, Jul 1, 2015 at 10:33 AM, Peter Eisentraut wrote:
>>> (If you're looking at the patch and wondering why there is no code to
>>> actually do anything with the replication slot, that's because the code
>>> that does the WAL streaming is already aware of replication slots
>>> because of the pg_receivexlog functionality that it also serves.  So all
>>> that needs to be done is set replication_slot.)
>>
>> This is cool to see this patch taking shape.
>>
>> +    my ($stdout, $stderr);
>> +    run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-' ],
>> '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
>> +    chomp $stdout;
>> +    return $stdout;
>>
>> Could it be possible to chomp and return $stderr as well here? It
>> seems useful to me to perform sanity checks after calling psql.
>
> Sure, makes sense.

OK, so here is more input about this set of patches.

Patch 1 looks good. It adds some tests to cover pg_basebackup -R and
checks if standby_mode and primary_conninfo are set correctly.
Patch 2 also looks fine. It adds tests for pg_basebackup -X. Both
patches are independent on the feature.

Regarding patch 3, I have more comments:
1) I think that documentation should clearly mention that if -R and -S
are used together, a primary_slot_name entry is added in the
recovery.conf generated with the slot name defined.
2)sub psql{       my ($dbname, $sql) = @_;
-       run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '<', \$sql or die;
+       my ($stdout, $stderr);
+       run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'
], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
+       chomp $stdout;
+       return $stdout;}
RewindTest.pm has a routine called check_query defined. I would be
great to extend a bit more psql than what I thought previously by
returning from it ($result, $stdout, $strerr) and make check_query
directly use it. This way we have a unique interface to run psql and
capture output. I don't mind writing this refactoring patch on top of
your set if that's necessary.
3)
+command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
'stream', '-S', 'slot1', '-R' ],
+       'pg_basebackup with replication slot and -R runs');
+$recovery_conf = slurp_file "$tempdir/backupR/recovery.conf";
+like(slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
slurp_file is slurping an incorrect file and $recovery_conf is used
nowhere after, so you could simply remove this line.
Regards,
-- 
Michael



Re: pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
On 7/2/15 3:37 AM, Michael Paquier wrote:
> Regarding patch 3, I have more comments:
> 1) I think that documentation should clearly mention that if -R and -S
> are used together, a primary_slot_name entry is added in the
> recovery.conf generated with the slot name defined.

Updated proposal attached.

> 2)
>  sub psql
>  {
>         my ($dbname, $sql) = @_;
> -       run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '<', \$sql or die;
> +       my ($stdout, $stderr);
> +       run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'
> ], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
> +       chomp $stdout;
> +       return $stdout;
>  }
> RewindTest.pm has a routine called check_query defined. I would be
> great to extend a bit more psql than what I thought previously by
> returning from it ($result, $stdout, $strerr) and make check_query
> directly use it. This way we have a unique interface to run psql and
> capture output. I don't mind writing this refactoring patch on top of
> your set if that's necessary.

Let's do that as a separate patch.  Adding multiple return values makes
calling awkward, so I'd like to sort out the actual use cases before we
do that.

> 3)
> +command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
> 'stream', '-S', 'slot1', '-R' ],
> +       'pg_basebackup with replication slot and -R runs');
> +$recovery_conf = slurp_file "$tempdir/backupR/recovery.conf";
> +like(slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
> slurp_file is slurping an incorrect file and $recovery_conf is used
> nowhere after, so you could simply remove this line.

Yeah, that was some leftover garbage.


Вложения

Re: pg_basebackup and replication slots

От
Michael Paquier
Дата:
On Wed, Jul 22, 2015 at 10:15 AM, Peter Eisentraut wrote:
> On 7/2/15 3:37 AM, Michael Paquier wrote:
>> 2)
>>  sub psql
>>  {
>>         my ($dbname, $sql) = @_;
>> -       run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '<', \$sql or die;
>> +       my ($stdout, $stderr);
>> +       run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'
>> ], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
>> +       chomp $stdout;
>> +       return $stdout;
>>  }
>> RewindTest.pm has a routine called check_query defined. I would be
>> great to extend a bit more psql than what I thought previously by
>> returning from it ($result, $stdout, $strerr) and make check_query
>> directly use it. This way we have a unique interface to run psql and
>> capture output. I don't mind writing this refactoring patch on top of
>> your set if that's necessary.
>
> Let's do that as a separate patch.  Adding multiple return values makes
> calling awkward, so I'd like to sort out the actual use cases before we
> do that.

OK.

>> 3)
>> +command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
>> 'stream', '-S', 'slot1', '-R' ],
>> +       'pg_basebackup with replication slot and -R runs');
>> +$recovery_conf = slurp_file "$tempdir/backupR/recovery.conf";
>> +like(slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
>> slurp_file is slurping an incorrect file and $recovery_conf is used
>> nowhere after, so you could simply remove this line.
>
> Yeah, that was some leftover garbage.

OK, thanks for the updated versions. Those ones look good to me.

Now, do we plan to do something about the creation of a slot. I
imagine that it would be useful if we could have --create-slot to
create a slot when beginning a base backup and if-not-exists to
control the error flow. There is already CreateReplicationSlot for
this purpose in streamutils.c so most of the work is already done.
Also, when a base backup fails for a reason or another, we should try
to drop the slot in disconnect_and_exit() if it has been created by
pg_basebackup. if if-not-exists is true and the slot already existed
when beginning, we had better not dropping it perhaps...
-- 
Michael



Re: pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
On 7/22/15 12:43 AM, Michael Paquier wrote:
> OK, thanks for the updated versions. Those ones look good to me.

Committed, thanks.

> Now, do we plan to do something about the creation of a slot. I
> imagine that it would be useful if we could have --create-slot to
> create a slot when beginning a base backup and if-not-exists to
> control the error flow. There is already CreateReplicationSlot for
> this purpose in streamutils.c so most of the work is already done.
> Also, when a base backup fails for a reason or another, we should try
> to drop the slot in disconnect_and_exit() if it has been created by
> pg_basebackup. if if-not-exists is true and the slot already existed
> when beginning, we had better not dropping it perhaps...

I think it would be worthwhile to work on that, but in my mind it's a
separate feature, and I don't have any use for it, so I'm not going to
rush into it.





Re: pg_basebackup and replication slots

От
Simon Riggs
Дата:
On 22 July 2015 at 05:43, Michael Paquier <michael.paquier@gmail.com> wrote:
 
Now, do we plan to do something about the creation of a slot. I
imagine that it would be useful if we could have --create-slot to
create a slot when beginning a base backup and if-not-exists to
control the error flow. There is already CreateReplicationSlot for
this purpose in streamutils.c so most of the work is already done.
Also, when a base backup fails for a reason or another, we should try
to drop the slot in disconnect_and_exit() if it has been created by
pg_basebackup. if if-not-exists is true and the slot already existed
when beginning, we had better not dropping it perhaps...

What is the purpose of creating a temporary slot?

My understand was that slots are supposed to be persistent so we should create them before use.

If we create a slot when one is needed and then drop automatically on session disconnect (as Heikki suggest), what benefit does using a slot give us?
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_basebackup and replication slots

От
Andres Freund
Дата:
On 2015-07-29 08:57:38 +0100, Simon Riggs wrote:
> On 22 July 2015 at 05:43, Michael Paquier <michael.paquier@gmail.com> wrote:
> 
> 
> > Now, do we plan to do something about the creation of a slot. I
> > imagine that it would be useful if we could have --create-slot to
> > create a slot when beginning a base backup and if-not-exists to
> > control the error flow. There is already CreateReplicationSlot for
> > this purpose in streamutils.c so most of the work is already done.
> > Also, when a base backup fails for a reason or another, we should try
> > to drop the slot in disconnect_and_exit() if it has been created by
> > pg_basebackup. if if-not-exists is true and the slot already existed
> > when beginning, we had better not dropping it perhaps...
> 
> 
> What is the purpose of creating a temporary slot?

> If we create a slot when one is needed and then drop automatically on
> session disconnect (as Heikki suggest), what benefit does using a slot give
> us?

The goal is to have a reliable pg_basebackup that doesn't fail due to
missing WAL (which can easily happen even with -X) . That seems like a
sane desire.  The point of using a temporary slot is to not have a
leftover slot afterwards, reserving resources. Especially important if
the basebackup actually failed...



Re: pg_basebackup and replication slots

От
Simon Riggs
Дата:
On 29 July 2015 at 09:09, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-29 08:57:38 +0100, Simon Riggs wrote:
> On 22 July 2015 at 05:43, Michael Paquier <michael.paquier@gmail.com> wrote:
>
>
> > Now, do we plan to do something about the creation of a slot. I
> > imagine that it would be useful if we could have --create-slot to
> > create a slot when beginning a base backup and if-not-exists to
> > control the error flow. There is already CreateReplicationSlot for
> > this purpose in streamutils.c so most of the work is already done.
> > Also, when a base backup fails for a reason or another, we should try
> > to drop the slot in disconnect_and_exit() if it has been created by
> > pg_basebackup. if if-not-exists is true and the slot already existed
> > when beginning, we had better not dropping it perhaps...
>
>
> What is the purpose of creating a temporary slot?

> If we create a slot when one is needed and then drop automatically on
> session disconnect (as Heikki suggest), what benefit does using a slot give
> us?

The goal is to have a reliable pg_basebackup that doesn't fail due to
missing WAL (which can easily happen even with -X) . That seems like a
sane desire. 

Agreed, that is sane. Slots are good.
 
The point of using a temporary slot is to not have a
leftover slot afterwards, reserving resources. Especially important if
the basebackup actually failed...

Creating a slot and then deleting it if the session disconnects does not successfully provide the functionality desired above.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_basebackup and replication slots

От
Andres Freund
Дата:
On 2015-07-29 09:17:04 +0100, Simon Riggs wrote:
> On 29 July 2015 at 09:09, Andres Freund <andres@anarazel.de> wrote:
> > The point of using a temporary slot is to not have a
> > leftover slot afterwards, reserving resources. Especially important if
> > the basebackup actually failed...
> >
> 
> Creating a slot and then deleting it if the session disconnects does not
> successfully provide the functionality desired above.

Uh? If you create the slot, start streaming, and then start the
basebackup, it does. It does *not* guarantee that the base backup can be
converted into a replica, but it's sufficient to guarantee it can
brought out of recovery.



Re: pg_basebackup and replication slots

От
Simon Riggs
Дата:
<div dir="ltr">On 29 July 2015 at 11:43, Andres Freund <span dir="ltr"><<a href="mailto:andres@anarazel.de"
target="_blank">andres@anarazel.de</a>></span>wrote:<br /><div class="gmail_extra"><div
class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">On2015-07-29 09:17:04 +0100, Simon Riggs wrote:<br /><span class="">> On 29 July 2015 at
09:09,Andres Freund <<a href="mailto:andres@anarazel.de">andres@anarazel.de</a>> wrote:<br /> > > The point
ofusing a temporary slot is to not have a<br /> > > leftover slot afterwards, reserving resources. Especially
importantif<br /> > > the basebackup actually failed...<br /> > ><br /> ><br /> > Creating a slot and
thendeleting it if the session disconnects does not<br /> > successfully provide the functionality desired above.<br
/><br/></span>Uh? If you create the slot, start streaming, and then start the<br /> basebackup, it does. It does *not*
guaranteethat the base backup can be<br /> converted into a replica, but it's sufficient to guarantee it can<br />
broughtout of recovery.<br /></blockquote></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Perhaps
weare misunderstanding the word "it" here. "it can be brought out of recovery"?</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">You appear to be saying that a backup that disconnects before completion is useful in
someway. How so?</div><div class="gmail_extra"><br /></div>If the slot is cleaned up on disconnect, as suggested, then
youend up with half a backup and WAL is cleaned up. The only possible use for slots is to reserve resources (as you
say);the resources will clearly not be reserved if we drop the slot on disconnect. What use is that?<br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra">-- <br /><div class="gmail_signature"><div dir="ltr"><span
style="color:rgb(136,136,136)">SimonRiggs                </span><a href="http://www.2ndquadrant.com/"
style="color:rgb(17,85,204)"target="_blank">http://www.2ndQuadrant.com/</a><br style="color:rgb(136,136,136)" /><span
style="color:rgb(136,136,136)">PostgreSQLDevelopment, 24x7 Support, Remote DBA, Training & Services</span><br
/></div></div></div></div>

Re: pg_basebackup and replication slots

От
Andres Freund
Дата:
On 2015-07-29 12:47:01 +0100, Simon Riggs wrote:
> On 29 July 2015 at 11:43, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2015-07-29 09:17:04 +0100, Simon Riggs wrote:
> > > On 29 July 2015 at 09:09, Andres Freund <andres@anarazel.de> wrote:
> > > > The point of using a temporary slot is to not have a
> > > > leftover slot afterwards, reserving resources. Especially important if
> > > > the basebackup actually failed...
> > > >
> > >
> > > Creating a slot and then deleting it if the session disconnects does not
> > > successfully provide the functionality desired above.
> >
> > Uh? If you create the slot, start streaming, and then start the
> > basebackup, it does. It does *not* guarantee that the base backup can be
> > converted into a replica, but it's sufficient to guarantee it can
> > brought out of recovery.
> >
> 
> Perhaps we are misunderstanding the word "it" here. "it can be brought out
> of recovery"?

The finished base backup.

> You appear to be saying that a backup that disconnects before completion is
> useful in some way. How so?

I'm not trying to say that at all.

As far as I understand this subthread the goal is to have a
pg_basebackup that internally creates a slot, so it can guarantee that
all the required WAL is present till streamed out by -X
stream/fetch. The problem with just creating a slot is that it'd "leak"
if pg_basebackup is killed, or the connection breaks.  The idea with the
ephemeral slot is that it'd automatically kept alive until the base
backup is complete, but would also automatically be dropped if the base
backup fails.

Makes sense?

We pretty much have all the required infrastructure for that in slot.c,
it's just not exposed to the replication protocol.

Greetings,

Andres Freund



Re: pg_basebackup and replication slots

От
Simon Riggs
Дата:
On 29 July 2015 at 13:00, Andres Freund <andres@anarazel.de> wrote:
 
As far as I understand this subthread the goal is to have a
pg_basebackup that internally creates a slot, so it can guarantee that
all the required WAL is present till streamed out by -X
stream/fetch. The problem with just creating a slot is that it'd "leak"
if pg_basebackup is killed, or the connection breaks.  The idea with the
ephemeral slot is that it'd automatically kept alive until the base
backup is complete, but would also automatically be dropped if the base
backup fails.

Makes sense?

So this would be needed when creating a standalone backup that would not be persistently connected to the master, yet we want to bring it up as a live/writable server in a single command, and we want to make it easy to script in case our script is killed?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_basebackup and replication slots

От
Andres Freund
Дата:
On 2015-07-29 13:45:22 +0100, Simon Riggs wrote:
> So this would be needed when creating a standalone backup that would not be
> persistently connected to the master, yet we want to bring it up as a
> live/writable server in a single command

I'm not understanding what you mean with 'single command'
here. Currently there's no way to do this safely without configuring a
high enough wal_keep_segments.

You can (since yesterday) manually create a slot, run pg_basebackup, and
drop the slot. But that's not safe if your script is interrupted
somehow. Since many base backups are run in a non-interactive fashion
asking for intervention to clean up in that case imo is not an
acceptable answer.

> and we want to make it easy to script in case our script is killed?

Or the connection dies/is killed, or the server is restarted, or ...



Re: pg_basebackup and replication slots

От
Peter Eisentraut
Дата:
On 7/29/15 8:00 AM, Andres Freund wrote:
> As far as I understand this subthread the goal is to have a
> pg_basebackup that internally creates a slot, so it can guarantee that
> all the required WAL is present till streamed out by -X
> stream/fetch. The problem with just creating a slot is that it'd "leak"
> if pg_basebackup is killed, or the connection breaks.  The idea with the
> ephemeral slot is that it'd automatically kept alive until the base
> backup is complete, but would also automatically be dropped if the base
> backup fails.

I don't think anyone actually said that.  I think the goal was to add
the functionality that pg_basebackup can optionally create a
(persistent) replication slot, both for its own use and for later use by
streaming.  Just so you don't have to call pg_create...slot() or
pg_receivexlog separately to create the slot.  And it was pointed out
that this created slot would need to be removed if pg_basebackup failed
for some reason.

What you describe also seems useful, but is a different sub-issue.