Re: [HACKERS] Logical Replication WIP
От | Petr Jelinek |
---|---|
Тема | Re: [HACKERS] Logical Replication WIP |
Дата | |
Msg-id | 2c297716-ac22-ba5c-24ae-7435c0ba85f2@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Logical Replication WIP (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Ответы |
Re: [HACKERS] Logical Replication WIP
(Erik Rijkers <er@xs4all.nl>)
Re: [HACKERS] Logical Replication WIP (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi, finally got to this (multiple emails squashed into one). On 04/01/17 18:46, Peter Eisentraut wrote: > Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz: > Merged thanks. > In CreateSubscription(), I don't think we should connect to the remote > if no slot creation is requested. Arguably, the point of that option is > to not make network connections. (That is what my documentation patch > above claims, in any case.) > Agreed and done. > I don't know why we need to check the PostgreSQL version number of the > remote. We should rely on the protocol version number, and we should > just make it work. When PG 11 comes around, subscribing from PG 10 to a > publisher on PG 11 should just work without any warnings, IMO. > Also agreed and removed. > 003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz > looks good now, documentation is clear now. > > Another fixup patch to remove excessive includes. Thanks merged. > Comments on 0004-Add-logical-replication-workers-v16.patch.gz: > > I didn't find any major problems. At times while I was testing strange > things it was not clear why "nothing is happening". I'll do some more > checking in that direction. > > Fixup patch attached that enhances some error messages, fixes some > typos, and other minor changes. See also comments below. > Merged. > > The way check_max_logical_replication_workers() is implemented creates > potential ordering dependencies in postgresql.conf. For example, > > max_logical_replication_workers = 100 > max_worker_processes = 200 > > fails, but if you change the order, it works. The existing > check_max_worker_processes() has the same problem, but I suspect because > it only checks against MAX_BACKENDS, nobody has ever seriously hit that > limit. > > I suggest just removing the check. If you set > max_logical_replication_workers higher than max_worker_processes and you > hit the lower limit, then whatever is controlling max_worker_processes > should complain with its own error message. > Good point, removed. > > The default for max_logical_replication_workers is 4, which seems very > little. Maybe it should be more like 10 or 20. The "Quick setup" > section recommends changing it to 10. We should at least be > consistent there: If you set a default value that is not 0, then it > should enough that we don't need to change it again in the Quick > setup. (Maybe the default max_worker_processes should also be > raised?) Well, it's 4 because max_worker_processes is 8, I think default max_worker_processes should be higher than max_logical_replication_workers so that's why I picked 4. If we are okay wit bumping the max_worker_processes a bit, I am all for increasing max_logical_replication_workers as well. The quick setup mentions 10 mainly for consistency with slots and wal senders (those IMHO should also not be 0 by default at this point...). > > +max_logical_replication_workers = 10 # one per subscription + one per > instance needed on subscriber > > I think this is incorrect (copied from max_worker_processes?). The > launcher does not count as one of the workers here. > > On a related note, should the minimum not be 0 instead of 1? > Eh, yes. > > About the changes to libpqrcv_startstreaming(). The timeline is not > really an option in the syntax. Just passing in a string that is > pasted in the final command creates too much coupling, I think. I > would keep the old timeline (TimeLineID tli) argument, and make the > options const char * [], and let startstreaming() assemble the final > string, including commas and parentheses. It's still not a perfect > abstraction, because you need to do the quoting yourself, but much > better. (Alternatively, get rid of the startstreaming call and just > have callers use libpqrcv_PQexec directly.) > I did this somewhat differently, with struct that defines options and has different union members for physical and logical replication. What do you think of that? > > Some of the header files are named inconsistently with their .c files. > I think src/include/replication/logicalworker.h should be split into > logicalapply.h and logicallauncher.h. Okay. > Not sure about > worker_internal.h. Maybe rename apply.c to worker.c? > Hmm I did that, seems reasonably okay. Original patch in fact had both worker.c and apply.c and I eventually moved the worker.c functions to either apply.c or launcher.c. > (I'm also not fond of throwing publicationcmds.h and > subscriptioncmds.h together into replicationcmds.h. Maybe that could > be changed, too) Okay. > > Various FATAL errors in logical/relation.c when the target relation is > not in the right state. Could those not be ERRORs? The behavior is > the same at the moment because background workers terminate on > uncaught exceptions, but that should eventually be improved. > Seems like you changed this in your patch. I don't have any objections. > > In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in > the code using this struct. (Maybe reloid -> localreloid) > Okay. > > Partitioned tables are not supported in either publications or as > replication targets. This is expected but should be fixed before the > final release. > Yes, that will need some discussion about corner case behaviour. For example, have partitioned table 'foo' which is in publication, then you have table 'bar' which is not in publication, you attach it to the partitioned table 'foo', should it automatically be added to publication? Then you detach it, should it then be removed from publication? What if 'bar' was in publication before it was attached/detached to/from 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we allow ONLY syntax for partitioned table when they are being added and removed? Sadly current partitioning section of the docs doesn't provide any guidance in terms of precedents for other actions here as it still speaks about using inheritance and check constraints directly instead of the new feature. My proposal would be to let partitions to be added/removed to/from publications normally (as they are now) and have them also check if parent table is published in case they aren't (ie, if partitioned table is in some publications, all partitions are implicitly as well without adding them to the pg_publication_rel catalog, but they also keep their own membership in publications as well individually there). That would mean we don't allow ONLY syntax for partitioned tables. One scenario where I am on the fence is what should happen here if we do ALTER PUBLICATION ... DROP TABLE partitioned_table in case that partitioned_table also contains partition which was explicitly added to the publication, should it keep its own membership or should it be removed? Maybe we could allow the ONLY clause only for DROP but not for ADD? > > In apply.c: > > The comment in apply_handle_relation() makes a point that the schema > validation is done later, but does not tell why. The answer is > probably because it doesn't matter and it's more convenient, but it > should be explained in the comment. Yes I noticed, I tried to explain. > > See XXX comment in logicalrep_worker_stop(). Yes that was a good point. > > The get_flush_position() return value is not intuitive from the > function name. Maybe make that another pointer argument for clarity. Okay. > reread_subscription() complains if the subscription name was changed. > I don't know why that is a problem. Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should be Assert? > > In launcher.c: > > pg_stat_get_subscription should hold LogicalRepWorkerLock around the > whole loop, so that it doesn't get inconsistent results when workers > change during the loop. > Done. > In relation.c: > > Inconsistent use of uint32 vs LogicalRepRelId. Pick one. Done. Attached is new version with your changes merged and above suggestions applied. It still does not support partitioned tables and does the filtering using FirstNormalObjectId. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: