Обсуждение: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

Поиск
Список
Период
Сортировка

[GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

От
Ludovic Vaugeois-Pepin
Дата:
I ran into the issue described below with 10.0 beta. The error I got is:

pg_basebackup: could not create temporary replication slot
"pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
already exists

A race condition? Or maybe I am doing something wrong.





Release:
    Name        : postgresql10-server
    Version     : 10.0
    Release     : beta1PGDG.rhel7


Test Type:
    Functional testing of a pacemaker resource agent
(https://github.com/ulodciv/pgha)


Test Detail:
    During context/environement setup, pg_basebackup is invoked (in
parallel) from multiple virtual machines. The backups are then started
as asynchronously replicated hot standbies.


Platform:
    Centos 7.3


Installation Method:
    yum -y install
https://download.postgresql.org/pub/repos/yum/testing/10/redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
    yum -y install postgresql10-server postgresql10-contrib


Platform Detail:


Test Procedure:

    Have pg_basebackup run simultaneously on multiple hosts against
the same instance eg:

        pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1 -Xs


Failure?

E               deploylib.deployer_error.DeployerError:
postgres@test5: got exit status 1 for:
E               pg_basebackup -h test4 -p 5432 -D
/var/lib/pgsql/10/data -U repl1 -Xs
E               stderr: pg_basebackup: could not create temporary
replication slot "pg_basebackup_2194": ERROR:  replication slot
"pg_basebackup_2194" already exists
E               pg_basebackup: child process exited with error 1
E               pg_basebackup: removing data directory "/var/lib/pgsql/10/data"


Test Results:


Comments:
    This seems to be new with 10. I recently began testing the
pacemaker resource agent against PG 10. I never had (or noticed) this
failure with 9.6.1 and 9.6.2.

--
Ludovic


Re: [GENERAL] pg_basebackup error: replication slot"pg_basebackup_2194" already exists

От
Kenneth Marshall
Дата:
On Tue, May 30, 2017 at 09:14:41PM +0200, Ludovic Vaugeois-Pepin wrote:
> I ran into the issue described below with 10.0 beta. The error I got is:
>
> pg_basebackup: could not create temporary replication slot
> "pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
> already exists
>
> A race condition? Or maybe I am doing something wrong.
>
>
> Release:
>     Name        : postgresql10-server
>     Version     : 10.0
>     Release     : beta1PGDG.rhel7
>
>
> Test Type:
>     Functional testing of a pacemaker resource agent
> (https://github.com/ulodciv/pgha)
>
>
> Test Detail:
>     During context/environement setup, pg_basebackup is invoked (in
> parallel) from multiple virtual machines. The backups are then started
> as asynchronously replicated hot standbies.
>
>
> Platform:
>     Centos 7.3
>
>
> Installation Method:
>     yum -y install
> https://download.postgresql.org/pub/repos/yum/testing/10/redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
>     yum -y install postgresql10-server postgresql10-contrib
>
>
> Platform Detail:
>
>
> Test Procedure:
>
>     Have pg_basebackup run simultaneously on multiple hosts against
> the same instance eg:
>
>         pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1 -Xs
>
>
> Failure?
>
> E               deploylib.deployer_error.DeployerError:
> postgres@test5: got exit status 1 for:
> E               pg_basebackup -h test4 -p 5432 -D
> /var/lib/pgsql/10/data -U repl1 -Xs
> E               stderr: pg_basebackup: could not create temporary
> replication slot "pg_basebackup_2194": ERROR:  replication slot
> "pg_basebackup_2194" already exists
> E               pg_basebackup: child process exited with error 1
> E               pg_basebackup: removing data directory "/var/lib/pgsql/10/data"
>
>
> Test Results:
>
>
> Comments:
>     This seems to be new with 10. I recently began testing the
> pacemaker resource agent against PG 10. I never had (or noticed) this
> failure with 9.6.1 and 9.6.2.
>
> --
> Ludovic


Hi,

Version 10 will create a temporary slot for you if one is not specified or
the --no-slot option is not used:

https://www.postgresql.org/docs/10/static/app-pgbasebackup.html

Regards,
Ken


Re: [GENERAL] pg_basebackup error: replication slot"pg_basebackup_2194" already exists

От
Magnus Hagander
Дата:
On Tue, May 30, 2017 at 9:14 PM, Ludovic Vaugeois-Pepin <ludovicvp@gmail.com> wrote:
I ran into the issue described below with 10.0 beta. The error I got is:

pg_basebackup: could not create temporary replication slot
"pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
already exists

A race condition? Or maybe I am doing something wrong.





Release:
    Name        : postgresql10-server
    Version     : 10.0
    Release     : beta1PGDG.rhel7


Test Type:
    Functional testing of a pacemaker resource agent
(https://github.com/ulodciv/pgha)


Test Detail:
    During context/environement setup, pg_basebackup is invoked (in
parallel) from multiple virtual machines. The backups are then started
as asynchronously replicated hot standbies.


Platform:
    Centos 7.3


Installation Method:
    yum -y install
https://download.postgresql.org/pub/repos/yum/testing/10/redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
    yum -y install postgresql10-server postgresql10-contrib


Platform Detail:


Test Procedure:

    Have pg_basebackup run simultaneously on multiple hosts against
the same instance eg:

        pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1 -Xs


Failure?

E               deploylib.deployer_error.DeployerError:
postgres@test5: got exit status 1 for:
E               pg_basebackup -h test4 -p 5432 -D
/var/lib/pgsql/10/data -U repl1 -Xs
E               stderr: pg_basebackup: could not create temporary
replication slot "pg_basebackup_2194": ERROR:  replication slot
"pg_basebackup_2194" already exists
E               pg_basebackup: child process exited with error 1
E               pg_basebackup: removing data directory "/var/lib/pgsql/10/data"


Test Results:


Comments:
    This seems to be new with 10. I recently began testing the
pacemaker resource agent against PG 10. I never had (or noticed) this
failure with 9.6.1 and 9.6.2.

Hah, that's an interesting failure. In the name of the slot, the 2194 comes from the pid -- but it's the pid of pg_basebackup.

I assume you're not running the two pg_basebackup processes on the same machine? Is it predictable when this happens (meaning that the pid value is actually predictable), or do you have to run it a large numbe rof times before it happens?

--

Fwd: [GENERAL] pg_basebackup error: replication slot"pg_basebackup_2194" already exists

От
Ludovic Vaugeois-Pepin
Дата:
Le 30 mai 2017 9:32 PM, "Magnus Hagander" <magnus@hagander.net> a écrit :

On Tue, May 30, 2017 at 9:14 PM, Ludovic Vaugeois-Pepin
<ludovicvp@gmail.com> wrote:
>
> I ran into the issue described below with 10.0 beta. The error I got is:
>
> pg_basebackup: could not create temporary replication slot
> "pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
> already exists
>
> A race condition? Or maybe I am doing something wrong.
>
>
>
>
>
> Release:
>     Name        : postgresql10-server
>     Version     : 10.0
>     Release     : beta1PGDG.rhel7
>
>
> Test Type:
>     Functional testing of a pacemaker resource agent
> (https://github.com/ulodciv/pgha)
>
>
> Test Detail:
>     During context/environement setup, pg_basebackup is invoked (in
> parallel) from multiple virtual machines. The backups are then started
> as asynchronously replicated hot standbies.
>
>
> Platform:
>     Centos 7.3
>
>
> Installation Method:
>     yum -y install
> https://download.postgresql.org/pub/repos/yum/testing/10/redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
>     yum -y install postgresql10-server postgresql10-contrib
>
>
> Platform Detail:
>
>
> Test Procedure:
>
>     Have pg_basebackup run simultaneously on multiple hosts against
> the same instance eg:
>
>         pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1 -Xs
>
>
> Failure?
>
> E               deploylib.deployer_error.DeployerError:
> postgres@test5: got exit status 1 for:
> E               pg_basebackup -h test4 -p 5432 -D
> /var/lib/pgsql/10/data -U repl1 -Xs
> E               stderr: pg_basebackup: could not create temporary
> replication slot "pg_basebackup_2194": ERROR:  replication slot
> "pg_basebackup_2194" already exists
> E               pg_basebackup: child process exited with error 1
> E               pg_basebackup: removing data directory "/var/lib/pgsql/10/data"
>
>
> Test Results:
>
>
> Comments:
>     This seems to be new with 10. I recently began testing the
> pacemaker resource agent against PG 10. I never had (or noticed) this
> failure with 9.6.1 and 9.6.2.


Hah, that's an interesting failure. In the name of the slot, the 2194
comes from the pid -- but it's the pid of pg_basebackup.

I assume you're not running the two pg_basebackup processes on the same machine?

Indeed, I run it from two VMs that were created from the same .ova
(packaged VM).


Is it predictable when this happens (meaning that the pid value is
actually predictable), or do you have to run it a large numbe rof
times before it happens?

I ran into this once, however I have been running tests on 10.0 for a
couple of days or so.

My guess is that the two hosts ended up using the same pid when
running the backup.


Re: [GENERAL] pg_basebackup error: replication slot"pg_basebackup_2194" already exists

От
Ludovic Vaugeois-Pepin
Дата:
On Tue, May 30, 2017 at 9:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, May 30, 2017 at 9:14 PM, Ludovic Vaugeois-Pepin
> <ludovicvp@gmail.com> wrote:
>>
>> I ran into the issue described below with 10.0 beta. The error I got is:
>>
>> pg_basebackup: could not create temporary replication slot
>> "pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
>> already exists
>>
>> A race condition? Or maybe I am doing something wrong.
>>
>>
>>
>>
>>
>> Release:
>>     Name        : postgresql10-server
>>     Version     : 10.0
>>     Release     : beta1PGDG.rhel7
>>
>>
>> Test Type:
>>     Functional testing of a pacemaker resource agent
>> (https://github.com/ulodciv/pgha)
>>
>>
>> Test Detail:
>>     During context/environement setup, pg_basebackup is invoked (in
>> parallel) from multiple virtual machines. The backups are then started
>> as asynchronously replicated hot standbies.
>>
>>
>> Platform:
>>     Centos 7.3
>>
>>
>> Installation Method:
>>     yum -y install
>>
>> https://download.postgresql.org/pub/repos/yum/testing/10/redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
>>     yum -y install postgresql10-server postgresql10-contrib
>>
>>
>> Platform Detail:
>>
>>
>> Test Procedure:
>>
>>     Have pg_basebackup run simultaneously on multiple hosts against
>> the same instance eg:
>>
>>         pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U repl1
>> -Xs
>>
>>
>> Failure?
>>
>> E               deploylib.deployer_error.DeployerError:
>> postgres@test5: got exit status 1 for:
>> E               pg_basebackup -h test4 -p 5432 -D
>> /var/lib/pgsql/10/data -U repl1 -Xs
>> E               stderr: pg_basebackup: could not create temporary
>> replication slot "pg_basebackup_2194": ERROR:  replication slot
>> "pg_basebackup_2194" already exists
>> E               pg_basebackup: child process exited with error 1
>> E               pg_basebackup: removing data directory
>> "/var/lib/pgsql/10/data"
>>
>>
>> Test Results:
>>
>>
>> Comments:
>>     This seems to be new with 10. I recently began testing the
>> pacemaker resource agent against PG 10. I never had (or noticed) this
>> failure with 9.6.1 and 9.6.2.
>
>
> Hah, that's an interesting failure. In the name of the slot, the 2194 comes
> from the pid -- but it's the pid of pg_basebackup.
>
> I assume you're not running the two pg_basebackup processes on the same
> machine? Is it predictable when this happens (meaning that the pid value is
> actually predictable), or do you have to run it a large numbe rof times
> before it happens?


Indeed, I run it from two VMs that were created from the same .ova
(packaged VM).
I ran into this once, however I have been running tests on 10.0 for a
couple of days or so.

My guess is that the two hosts ended up using the same pid when
running the backup.