Re: [HACKERS] Logical Replication WIP
От | Peter Eisentraut |
---|---|
Тема | Re: [HACKERS] Logical Replication WIP |
Дата | |
Msg-id | 245f59d7-62d2-6ae3-6650-c2b4df51dd33@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Logical Replication WIP (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Список | pgsql-hackers |
Comments on 0004-Add-logical-replication-workers-v16.patch.gz: I didn't find any major problems. At times while I was testing strange things it was not clear why "nothing is happening". I'll do some more checking in that direction. Fixup patch attached that enhances some error messages, fixes some typos, and other minor changes. See also comments below. --- The way check_max_logical_replication_workers() is implemented creates potential ordering dependencies in postgresql.conf. For example, max_logical_replication_workers = 100 max_worker_processes = 200 fails, but if you change the order, it works. The existing check_max_worker_processes() has the same problem, but I suspect because it only checks against MAX_BACKENDS, nobody has ever seriously hit that limit. I suggest just removing the check. If you set max_logical_replication_workers higher than max_worker_processes and you hit the lower limit, then whatever is controlling max_worker_processes should complain with its own error message. --- The default for max_logical_replication_workers is 4, which seems very little. Maybe it should be more like 10 or 20. The "Quick setup" section recommends changing it to 10. We should at least be consistent there: If you set a default value that is not 0, then it should enough that we don't need to change it again in the Quick setup. (Maybe the default max_worker_processes should also be raised?) +max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber I think this is incorrect (copied from max_worker_processes?). The launcher does not count as one of the workers here. On a related note, should the minimum not be 0 instead of 1? --- About the changes to libpqrcv_startstreaming(). The timeline is not really an option in the syntax. Just passing in a string that is pasted in the final command creates too much coupling, I think. I would keep the old timeline (TimeLineID tli) argument, and make the options const char * [], and let startstreaming() assemble the final string, including commas and parentheses. It's still not a perfect abstraction, because you need to do the quoting yourself, but much better. (Alternatively, get rid of the startstreaming call and just have callers use libpqrcv_PQexec directly.) --- Some of the header files are named inconsistently with their .c files. I think src/include/replication/logicalworker.h should be split into logicalapply.h and logicallauncher.h. Not sure about worker_internal.h. Maybe rename apply.c to worker.c? (I'm also not fond of throwing publicationcmds.h and subscriptioncmds.h together into replicationcmds.h. Maybe that could be changed, too.) --- Various FATAL errors in logical/relation.c when the target relation is not in the right state. Could those not be ERRORs? The behavior is the same at the moment because background workers terminate on uncaught exceptions, but that should eventually be improved. A FATAL error will lead to a LOG: unexpected EOF on standby connection on the publisher, because the process just dies without protocol shutdown. (And then it reconnects and tries again. So we might as well not die and just retry again.) --- In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in the code using this struct. (Maybe reloid -> localreloid) --- Partitioned tables are not supported in either publications or as replication targets. This is expected but should be fixed before the final release. --- In apply.c: The comment in apply_handle_relation() makes a point that the schema validation is done later, but does not tell why. The answer is probably because it doesn't matter and it's more convenient, but it should be explained in the comment. See XXX comment in logicalrep_worker_stop(). The get_flush_position() return value is not intuitive from the function name. Maybe make that another pointer argument for clarity. reread_subscription() complains if the subscription name was changed. I don't know why that is a problem. --- In launcher.c: pg_stat_get_subscription should hold LogicalRepWorkerLock around the whole loop, so that it doesn't get inconsistent results when workers change during the loop. --- In relation.c: Inconsistent use of uint32 vs LogicalRepRelId. Pick one. :) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: