Re: speed up a logical replica setup

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: speed up a logical replica setup
Дата
Msg-id 085203c4-580d-4c50-90e3-e47249e14585@eisentraut.org
обсуждение исходный текст
Ответ на RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Ответы RE: speed up a logical replica setup
Список pgsql-hackers
On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote:
> 02.
> 
> "The main difference between the logical replication setup and pg_createsubscriber
> is the initial data copy."
> 
> Grammarly suggests:
> "The initial data copy is the main difference between the logical replication
> setup and pg_createsubscriber."

I think that change is worse.

> 09.
> 
> "The source server must accept connections from the target server. The source server must not be in recovery."
> 
> Grammarly suggests:
> "The source server must accept connections from the target server and not be in recovery."

I think the previous version is better.

> 17.
> 
> "specifies promote"
> 
> We can do double-quote for the word promote.

The v30 patch has <literal>promote</literal>, which I think is adequate.

> 19.
> 
> "is accepting read-write transactions"
> 
> Grammarly suggests:
> "accepts read-write transactions"

I like the first one better.

> 20.
> 
> New options must be also documented as well. This helps not only users but also
> reviewers.
> (Sometimes we cannot identify that the implementation is intentinal or not.)

Which ones are missing?

> 21.
> 
> Also, not sure the specification is good. I preferred to specify them by format
> string. Because it can reduce the number of arguments and I cannot find use cases
> which user want to control the name of objects.
> 
> However, your approach has a benefit which users can easily identify the generated
> objects by pg_createsubscriber. How do other think?

I think listing them explicitly is better for the first version.  It's 
simpler to implement and more flexible.

> 22.
> 
> ```
> #define    BASE_OUTPUT_DIR        "pg_createsubscriber_output.d"
> ```
> 
> No one refers the define.

This is gone in v30.

> 23.
>   
> ```
> }            CreateSubscriberOptions;
> ...
> }            LogicalRepInfo;
> ```
> 
> Declarations after the "{" are not needed, because we do not do typedef.

Yeah, this is actually wrong, because as it is written now, it defines 
global variables.

> 22.
> 
> While seeing definitions of functions, I found that some pointers are declared
> as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
> changed but not the constant. Is it just missing or is there another rule?

Yes, more could be done here.  I have attached a patch for this.  (This 
also requires the just-committed 48018f1d8c.)

> 24.
> 
> ```
> /* standby / subscriber data directory */
> static char *subscriber_dir = NULL;
> ```
> 
> It is bit strange that only subscriber_dir is a global variable. Caller requires
> the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
> main. So, how about makeing `CreateSubscriberOptions opt` to global one?

Fewer global variables seem preferable.  Only make global as needed.

> 30.
> 
> ```
>         if (num_replslots == 0)
>             dbinfo[i].replslotname = pg_strdup(genname);
> ```
> 
> I think the straightforward way is to use the name of subscription if no name
> is specified. This follows the rule for CREATE SUBSCRIPTION.

right

> 31.
> 
> ```
>         /* Create replication slot on publisher */
>         if (lsn)
>             pg_free(lsn);
> ```
> 
> I think allocating/freeing memory is not so efficient.
> Can we add a flag to create_logical_replication_slot() for controlling the
> returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
> as flag.

Nothing is even using the return value of 
create_logical_replication_slot().  I think this can be removed altogether.

> 34.
> 
> ```
>         {"config-file", required_argument, NULL, 1},
>         {"publication", required_argument, NULL, 2},
>         {"replication-slot", required_argument, NULL, 3},
>         {"subscription", required_argument, NULL, 4},
> ```
> 
> The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
> options which do not have short notation are listed behind.
> 
> 35.
> 
> ```
>     opt.sub_port = palloc(16);
> ```
> 
> Per other lines, pg_alloc() should be used.

Even better psprintf().

> 37.
> 
> ```
>     /* Register a function to clean up objects in case of failure */
>     atexit(cleanup_objects_atexit);
> ```
> 
> Sorry if we have already discussed. I think the registration can be moved just
> before the boot of the standby. Before that, the callback will be no-op.

But it can also stay where it is.  What is the advantage of moving it later?

> 40.
> 
> ```
>      * XXX this code was extracted from BootStrapXLOG().
> ```
> 
> So, can we extract the common part to somewhere? Since system identifier is related
> with the controldata file, I think it can be located in controldata_util.c.

Let's leave it as is for this PG release.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Add pg_basetype() function to obtain a DOMAIN base type