Re: pglogical - logical replication contrib module

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: pglogical - logical replication contrib module
Дата
Msg-id 569925DA.7090102@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: pglogical - logical replication contrib module  (Steve Singer <steve@ssinger.info>)
Список pgsql-hackers
On 2016-01-09 19:30, Steve Singer wrote:\
> I am going to send my comments/issues out in batches as I find them
> instead of waiting till I look over everything.
>

Thanks for looking at this! Yes going in batches/steps makes sense, this 
is huge patch.

>
> I find this part of the documentation a bit unclear
>
>
> +Once the provider node is setup, subscribers can be subscribed to it.
> First the
> +subscriber node must be created:
> +
> +    SELECT pglogical.create_node(
> +        node_name := 'subscriber1',
> +        dsn := 'host=thishost port=5432 dbname=db'
> +    );
> +
>
> My initial reading was that I should execute this on the provider node.
> Perhaps instead
> -----------------
> Once the provider node is setup you can then create subscriber nodes.
> Create the subscriber nodes and
> then execute the following commands on each subscriber node
>
> create extension pglogical
>
> select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
> dbname=db port=5432');
>
> -------------------

Makes sense I guess, this is probably relic of how this internally 
evolved (we used to have providers and subscribers before we merged them 
into nodes).

>
> Also the documentation for create_subscription talks about
>
> +  - `synchronize_structure` - specifies if to synchronize structure from
> +    provider to the subscriber, default true
>

Not sure what's your comment on this.

>
>
> I did the following
>
> test2=# select pglogical.create_subscription(subscription_name:='default
> sub',provider_dsn:='host=localhost dbname=test1 port=5436');
>   create_subscription
> ---------------------
>             247109879
>
>
> Which then resulted in the following showing up in my  PG log
>
> LOG:  worker process: pglogical apply 16542:247109879 (PID 4079) exited
> with exit code 1
> ERROR:  replication slot name "pgl_test2_test1_default sub" contains
> invalid character
> HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL:  could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_default sub" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR:  replication slot name
> "pgl_test2_test1_default sub" contains invalid character
>      HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>
>
> The create_subscription command should check if the subscription name is
> valid (meets the rules that will be applied against the slot command).
>

Yes, fixed. Also added some other sensitization code since we also use 
dbname in slot name and that can contain whatever.

> I wondered how I could fix my mistake.
>
> The docs say
>
> +- `pglogical.pglogical_drop_subscription(subscription_name name,
> ifexists bool)`
> +  Disconnects the subscription and removes it from the catalog.
> +
>
> test2=# select pglogical.pglogical_drop_subscription('default sub', true);
> ERROR:  function pglogical.pglogical_drop_subscription(unknown, boolean)
> does not exist
>
>
> The command is actually called pglogical.drop_subscription the docs
> should be fixed to show the actual command name
>

Yep, got this from other people as well, fixed.

>
> I then wanted to add a second table to my database. ('b').
>
> select pglogical.replication_set_add_table('default','public.b',true);
>   replication_set_add_table
> ---------------------------
>   t
> (1 row)
>
> In my pglog I then got
>
> LOG:  starting sync of table public.b for subscriber defaultsub
> ERROR:  replication slot name "pgl_test2_test1_defaultsub_public.b"
> contains invalid character
> HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL:  could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR:  replication slot name
> "pgl_test2_test1_defaultsub_public.b" contains invalid character
>      HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>

Right, needed the sensitization as well (I am actually using the hash 
now as there is only 8 chars left anyway).

>
> I then did
>
> test1=# select
> pglogical.replication_set_remove_table('default','public.b');
>   replication_set_remove_table
> ------------------------------
>   t
> (1 row)
>
>
> but my log still keep repeating the error, so I tried connecting to the
> replica and did the same
>
> test2=# select
> pglogical.replication_set_remove_table('default','public.b');
> ERROR:  replication set mapping -303842815:16726 not found
>
> Is there any way to recover from this situation?
>

Not really, there is no api yet to remove table from synchronization 
process so you'd have to manually delete row from 
pglogical.local_sync_status on subscriber, kill the sync process and 
remove the slot. I will think about what would be good api to solve this.

> The documenation says I can drop a replication set, maybe that will let
> replication continue.
>
> +- `pglogical.delete_replication_set(set_name text)`
> +  Removes the replication set.
> +
>
> select pglogical.delete_replication_set('default');
> ERROR:  function pglogical.delete_replication_set(unknown) does not exist
> LINE 1: select pglogical.delete_replication_set('default');
>                 ^
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.
>
> The function is actually pglogical.drop_replication_set , the docs
> should be updated.
> (note that didn't fix my problem either but then dropping the
> subscription did seem to work).

Yeah it doesn't as the problem is on subscriber, replication sets only 
affect provider. And fixed the docs.

>
>
>
> I then re-added the default set to the origin and resubscribed my replica
>
>
>
> test2=#  select
> pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
> dbname=test1 port=5436');
>   create_subscription
> ---------------------
>            2974019075
>
>
> I then saw a bunch of
> LOG:  worker process: pglogical apply 16542:2974019075 (PID 26778)
> exited with exit code 1
> ERROR:  subscriber defaultsub initialization failed during
> nonrecoverable step (s), please try the setup again
> LOG:  worker process: pglogical apply 16542:2974019075 (PID 26779)
> exited with exit code 1
>
> in the log but then those stopped and I see
>
> test2=# select  pglogical.show_subscription_status();
> show_subscription_status
>
> --------------------------------------------------------------------------------
>
> --------------------------------------------------
>   (defaultsub,down,test1,"host=localhost dbname=test1
> port=5436",pgl_test2_test1_
> defaultsub,"{default,default_insert_only}",{all})
> (1 row)
>
>
> I'm not really sure what to do to 'recover' my cluster at this point so
> I'll send this off and rebuild my cluster and start over.
>

I think the problem here is that you resubscribed with 
syncrhonize_structure := true while the conflicting structure already 
existed, that option only works correctly when there is no conflicting 
structure (we don't try to make diffs or anything, just dump/restore). 
Recovering should be drop the uninitialized subscription and create new 
one where you don't synchronize structure.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Benedikt Grundmann
Дата:
Сообщение: Re: Death by regexp_replace
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] about test_parser installation failure problem(PostgreSQL in 9.5.0)?