RE: speed up a logical replica setup

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: speed up a logical replica setup
Дата
Msg-id TY3PR01MB9889FEDBBF74A9F79CA7CC87F57A2@TY3PR01MB9889.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Ответы Re: speed up a logical replica setup  ("Euler Taveira" <euler@eulerto.com>)
Список pgsql-hackers
Dear hackers,

Here are comments for v8 patch set. I may revise them by myself,
but I want to post here to share all of them.

01.
```
/* Options */
static char *pub_conninfo_str = NULL;
static SimpleStringList database_names = {NULL, NULL};
static int    wait_seconds = DEFAULT_WAIT;
static bool retain = false;
static bool dry_run = false;
```

Just to confirm - is there a policy why we store the specified options? If you
want to store as global ones, username and port should follow (my fault...).
Or, should we have a structure to store them?

02.
```
{"subscriber-conninfo", required_argument, NULL, 'S'},
```

This is my fault, but "--subscriber-conninfo" is still remained. It should be
removed if it is not really needed.

03.
```
{"username", required_argument, NULL, 'u'},
```

Should we accept 'U' instead of 'u'?

04.
```
{"dry-run", no_argument, NULL, 'n'},
```

I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
which value would be changed based on the input. As for the pg_upgrade, it checks
whether the node can be upgraded for now. I think, we should have the checking
feature, so it should be renamed to --check. Also, the process should exit earlier
at that time.

05.
I felt we should accept some settings from enviroment variables, like pg_upgrade.
Currently, below items should be acceted.

- data directory
- username
- port
- timeout

06.
```
pg_logging_set_level(PG_LOG_WARNING);
```

If the default log level is warning, there are no ways to output debug logs.
(-v option only raises one, so INFO would be output)
I think it should be PG_LOG_INFO.

07.
Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()?

08.
Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
should be created. The name should contain the timestamp.

09.
Not sure, but should we check max_slot_wal_keep_size of primary server? It can
avoid to fail starting of logical replicaiton.

10.
```
    nslots_new = nslots_old + dbarr.ndbs;

    if (nslots_new > max_replication_slots)
    {
        pg_log_error("max_replication_slots (%d) must be greater than or equal to "
                     "the number of replication slots required (%d)", max_replication_slots, nslots_new);
        exit(1);
    }
```

I think standby server must not have replication slots. Because subsequent
pg_resetwal command discards all the WAL file, so WAL records pointed by them
are removed. Currently pg_resetwal does not raise ERROR at that time.

11.
```
    /*
     * Stop the subscriber if it is a standby server. Before executing the
     * transformation steps, make sure the subscriber is not running because
     * one of the steps is to modify some recovery parameters that require a
     * restart.
     */
    if (stat(pidfile, &statbuf) == 0)
```

I kept just in case, but I'm not sure it is still needed. How do you think?
Removing it can reduce an inclusion of pidfile.h.

12.
```
        pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
                              standby.bindir, standby.pgdata);
        rc = system(pg_ctl_cmd);
        pg_ctl_status(pg_ctl_cmd, rc, 0);
```


There are two places to stop the instance. Can you divide it into a function?

13.
```
     * A temporary replication slot is not used here to avoid keeping a
     * replication connection open (depending when base backup was taken, the
     * connection should be open for a few hours).
     */
    conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
    if (conn == NULL)
        exit(1);
    consistent_lsn = create_logical_replication_slot(conn, true,
                                                     &dbarr.perdb[0]);
```

I didn't notice the comment, but still not sure the reason. Why we must reserve
the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
is created only for getting a consistent_lsn. So we do not have to keep.
Also, just before, logical replication slots for each databases are created, so
WAL records are surely reserved.

14.

```
    pg_log_info("starting the subscriber");
    start_standby_server(&standby, subport, server_start_log);
```

This info should be in the function.

15.
```
    /*
     * Create a subscription for each database.
     */
    for (i = 0; i < dbarr.ndbs; i++)
```

This can be divided into a function, like create_all_subscriptions().

16.
My fault: usage() must be updated.

17. use_primary_slot_name
```
    if (PQntuples(res) != 1)
    {
        pg_log_error("could not obtain replication slot information: got %d rows, expected %d row",
                     PQntuples(res), 1);
        return NULL;
    }
```

Error message should be changed. I think this error means the standby has wrong primary_slot_name, right?

18. misc
Sometimes the pid of pg_subscriber is referred. It can be stored as global variable.

19.
C99-style has been allowed, so loop variables like "i" can be declared in the for-statement, like

```
for (int i = 0; i < MAX; i++)
```

20.
Some comments, docs, and outputs must be fixed when the name is changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sutou Kouhei
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Improve WALRead() to suck data directly from WAL buffers when possible