Re: speed up a logical replica setup
От | Shubham Khanna |
---|---|
Тема | Re: speed up a logical replica setup |
Дата | |
Msg-id | CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: speed up a logical replica setup ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
RE: speed up a logical replica setup
|
Список | pgsql-hackers |
On Thu, Feb 15, 2024 at 4:53 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Euler, > > > Policy) > > > > Basically, we do not prohibit to connect to primary/standby. > > primary_slot_name may be changed during the conversion and > > tuples may be inserted on target just after the promotion, but it seems no issues. > > > > API) > > > > -D (data directory) and -d (databases) are definitively needed. > > > > Regarding the -P (a connection string for source), we can require it for now. > > But note that it may cause an inconsistency if the pointed not by -P is different > > from the node pointde by primary_conninfo. > > > > As for the connection string for the target server, we can choose two ways: > > a) > > accept native connection string as -S. This can reuse the same parsing > > mechanism as -P, > > but there is a room that non-local server is specified. > > > > b) > > accept username/port as -U/-p > > (Since the policy is like above, listen_addresses would not be overwritten. Also, > > the port just specify the listening port). > > This can avoid connecting to non-local, but more options may be needed. > > (E.g., Is socket directory needed? What about password?) > > > > Other discussing point, reported issue) > > > > Points raised by me [1] are not solved yet. > > > > * What if the target version is PG16-? > > * What if the found executables have diffent version with pg_createsubscriber? > > * What if the target is sending WAL to another server? > > I.e., there are clusters like `node1->node2-.node3`, and the target is node2. > > * Can we really cleanup the standby in case of failure? > > Shouldn't we suggest to remove the target once? > > * Can we move outputs to stdout? > > Based on the discussion, I updated the patch set. Feel free to pick them and include. > Removing -P patch was removed, but removing -S still remained. > > Also, while testing the patch set, I found some issues. > > 1. > Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>, > but the inclusion was removed per comment. Added the inclusion again. > > 2. > As Shubham pointed out [3], when we convert an intermediate node of cascading replication, > the last node would stuck. This is because a walreciever process requires nodes have the same > system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr. > > 3. > Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create > publications on the standby node. > > 4. > If the standby server was initialized as PG16-, this command would fail. > Because the API of pg_logical_create_replication_slot() were changed. > > 5. > Also, used pg_ctl commands must have same versions with the instance. > I think we should require all the executables and servers must be a same major version. > > Based on them, below part describes attached ones: > > V20-0001: same as Euler's patch, v17-0001. > V20-0002: Update docs per recent changes. Same as v19-0002 > V20-0003: Modify the alignment of codes. Same as v19-0003 > V20-0004: Change an argument of get_base_conninfo. Same as v19-0004 > === experimental patches === > V20-0005: Add testcases. Same as v19-0004 > V20-0006: Update a comment above global variables. Same as v19-0005 > V20-0007: Address comments from Vignesh. Some parts you don't like > are reverted. > V20-0008: Fix error message in get_bin_directory(). Same as v19-0008 > V20-0009: Remove -S option. Refactored from v16-0007 > V20-0010: Add check versions of executables and the target, per above and [4] > V20-0011: Detect a disconnection while waiting the recovery, per [4] > V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above. > > [1]: https://cirrus-ci.com/task/4619792833839104 > [2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com > [3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com > [4]: https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com I found a couple of issues, while verifying the cascaded replication with the following scenarios: Scenario 1) Create cascade replication like node1->node2->node3 without using replication slots (attached cascade_3node_setup_without_slots has the script for this): Then I ran pg_createsubscriber by specifying primary as node1 and standby as node3, this scenario runs successfully. I was not sure if this should be supported or not? Scenario 2) Create cascade replication like node1->node2->node3 using replication slots (attached cascade_3node_setup_with_slots has the script for this): Here, slot name was used as slot1 for node1 to node2 and slot2 for node2 to node3. Then I ran pg_createsubscriber by specifying primary as node1 and standby as node3. In this case pg_createsubscriber fails with the following error: pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row [Inferior 1 (process 2623483) exited with code 01] This is failing because slot name slot2 is used between node2->node3 but pg_createsubscriber is checked for slot1, the slot which is used for replication between node1->node2. Thoughts? Thanks and Regards, Shubham Khanna.
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Daniel GustafssonДата:
Сообщение: Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Следующее
От: Daniel GustafssonДата:
Сообщение: Re: Allow passing extra options to initdb for tests