logical changeset generation v3
Hi, In response to this you will soon find the 14 patches that currently implement $subject. I'll go over each one after showing off for a bit: Start postgres: Start postgres instance (with pg_hba.conf allowing replication cons): $ postgres -D ~/tmp/pgdev-lcr \ -c wal_level=logical \ -c max_wal_senders=10 \ -c max_logical_slots=10\ -c wal_keep_segments=100 \ -c log_line_prefix="[%p %x] " Start the changelog receiver: $ pg_receivellog -h /tmp -f /dev/stderr -d postgres -v Generate changes: $ psql -h /tmp postgres <<EOF DROP TABLE IF EXISTS replication_example; CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); -- plain insert INSERT INTO replication_example(somedata, text) VALUES (1, 1); -- plain update UPDATE replication_example SET somedata = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); -- plain delete DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq')); -- wrapped in a transaction BEGIN; INSERT INTO replication_example(somedata, text) VALUES (1, 1); UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq')); COMMIT; -- dont write out aborted data BEGIN; INSERT INTO replication_example(somedata, text) VALUES (2, 1); UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); DELETE FROM replication_example WHERE id = (SELECT currval('replication_example_id_seq')); ROLLBACK; -- add a column BEGIN; INSERT INTO replication_example(somedata, text) VALUES (3, 1); ALTER TABLE replication_example ADD COLUMN bar int; INSERT INTO replication_example(somedata, text, bar) VALUES (3, 1, 1); COMMIT; -- once more outside INSERT INTO replication_example(somedata, text, bar) VALUES (4, 1, 1); -- DDL with table rewrite BEGIN; INSERT INTO replication_example(somedata, text) VALUES (5, 1); ALTER TABLE replication_example RENAME COLUMN text TO somenum; INSERT INTO replication_example(somedata, somenum) VALUES (5, 2); ALTER TABLE replication_example ALTER COLUMN somenum TYPE int4 USING (somenum::int4); INSERT INTO replication_example(somedata, somenum) VALUES (5, 3); COMMIT; EOF And the results printed by llog: BEGIN 16556826 COMMIT 16556826 BEGIN 16556827 table "replication_example_id_seq": INSERT: sequence_name[name]:replication_example_id_seq last_value[int8]:1 start_value[int8]:1increment_by[int8]:1 max_value[int8]:9223372036854775807 min_value[int8]:1 cache_value[int8]:1 log_cnt[int8]:0is_cycled[bool]:f is_called[bool]:f COMMIT 16556827 BEGIN 16556828 table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 text[varchar]:1 COMMIT 16556828 BEGIN 16556829 table "replication_example": UPDATE: id[int4]:1 somedata[int4]:-1 text[varchar]:1 COMMIT 16556829 BEGIN 16556830 table "replication_example": DELETE (pkey): id[int4]:1 COMMIT 16556830 BEGIN 16556833 table "replication_example": INSERT: id[int4]:4 somedata[int4]:3 text[varchar]:1 table "replication_example": INSERT: id[int4]:5 somedata[int4]:3 text[varchar]:1 bar[int4]:1 COMMIT 16556833 BEGIN 16556834 table "replication_example": INSERT: id[int4]:6 somedata[int4]:4 text[varchar]:1 bar[int4]:1 COMMIT 16556834 BEGIN 16556835 table "replication_example": INSERT: id[int4]:7 somedata[int4]:5 text[varchar]:1 bar[int4]:(null) table "replication_example": INSERT: id[int4]:8 somedata[int4]:5 somenum[varchar]:2 bar[int4]:(null) table "pg_temp_74943": INSERT: id[int4]:4 somedata[int4]:3 somenum[int4]:1 bar[int4]:(null) table "pg_temp_74943": INSERT: id[int4]:5 somedata[int4]:3 somenum[int4]:1 bar[int4]:1 table "pg_temp_74943": INSERT: id[int4]:6 somedata[int4]:4 somenum[int4]:1 bar[int4]:1 table "pg_temp_74943": INSERT: id[int4]:7 somedata[int4]:5 somenum[int4]:1 bar[int4]:(null) table "pg_temp_74943": INSERT: id[int4]:8 somedata[int4]:5 somenum[int4]:2 bar[int4]:(null) table "replication_example": INSERT: id[int4]:9 somedata[int4]:5 somenum[int4]:3 bar[int4]:(null) COMMIT 16556835 As you can see above we can decode WAL in the presence of nearly all forms of DDL. The plugin that outputted these changes is supposed to be added to contrib and is fairly small and uncomplicated. An interesting piece of information might be that in the very preliminary benchmarking I have done on this even the textual decoding could keep up with a full tilt pgbench -c16 -j16 -M prepared on my (somewhat larger) workstation. The wal space overhead was less than 1% between two freshly initdb'ed clusters, comparing wal_level=hot_standby with =logical. With a custom pgbench script I can saturate the decoding to the effect that it lags a second or so, but once I write out the data in a binary format it can keep up again. The biggest overhead is currently the more slowly increasing Global/RecentXmin, but that can be greatly improved by logging xl_running_xact's more than just every checkpoint. A short overview over the patches in this series: * Add minimal binary heap implementation Abhijit submitted a nicer version of this, the plan is to rebase ontop of that once people are happy with the interface. (unchanged) * Add support for a generic wal reading facility dubbed XLogReader There's some discussion about whats the best way to implement this in a separate CF topic. (unchanged) * Add simple xlogdump tool Very nice for debugging, couldn't have developed this without. Obviously not a prerequisite for comitting this feature but still pretty worthy. (quite a bit updated, still bad build infrastructure) * Add a new RELFILENODE syscache to fetch a pg_class entry via (reltablespace, relfilenode) Relatively simple, somewhat contentious due to some uniqueness issues. Would very much welcome input from somebody with syscache experience on this. It was previously suggested to write something like attoptcache.c for this, but to me that seems to be code-duplication. We can go that route though. (unchanged) * Add a new relmapper.c function RelationMapFilenodeToOid that acts as a reverse of RelationMapOidToFilenode Simple. I don't even think its contentious... Just wasn't needed before. (unchanged) * Add a new function pg_relation_by_filenode to lookup up a relation given the tablespace and the filenode OIDs Just a nice to have thing for debugging, not a prerequisite for the feature. (unchanged) * Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement Uncomplicated and I hope uncontentious. (new) *Store the number of subtransactions in xl_running_xacts separately fromtoplevel xids Increases the size of xl_running_xacts by 4bytes in the worst case, decreases it in some others. Improves the efficiency of some HS operations. Should be ok? (new) * Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader Not sure if people will complain about this? Its rather simple due to the fact that the HeapTupleSatisfiesVisibility wrapper already took a HeapTupleHeader as parameter. (new) * Allow walsender's to connect to a specific database This has been requested by others. I think we need to work on the external interface a bit, should be ok otherwise. (new) * Introduce wal decoding via catalog timetravel This is the meat of the feature. I think this is going in a good direction, still needs some work, but architectural review can really start now. (more later) (heavily changed) * Add a simple decoding module in contrib named 'test_decoding' The much requested example contrib module. (new) * Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes Debugging tool to receive changes and write them to a file. Needs some more options and probably shouldn't live inside pg_basebackup's directory. (new) * design document v2.3 and snapshot building design doc v0.2 (unchanged) There remains quite a bit to be done but I think the state of the patch has improved quite a bit. The biggest thing now is to get input about the user facing parts so we can get some aggreement there. Todo: * testing infrastructure (isolationtester) * persistence/spilling to disk of built snapshots, longrunning transactions * user docs * more frequent lowering of xmins * more docs about the internals * support for user declared catalog tables * actual exporting of initial pg_export snapshots after INIT_LOGICAL_REPLICATION * own shared memory segment instead of piggybacking on walsender's * nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the outside. * more frequent xl_running_xid's so xmin can be upped more frequently Please comment! Happy and tired, Andres --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: