logical changeset generation v5

Поиск
Список
Период
Сортировка
От Andres Freund
Тема logical changeset generation v5
Дата
Msg-id 20130614224817.GA19641@awork2.anarazel.de
обсуждение исходный текст
Ответы Re: changeset generation v5-01 - Patches & git tree  (Andres Freund <andres@2ndquadrant.com>)
Re: logical changeset generation v5  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Hi!

I am rather pleased to announce the next version of the changeset
extraction patchset. Thanks to help from a large number of people I
think we are slowly getting to the point where it is getting
committable.

Since the last submitted version
(20121115002746.GA7692@awork2.anarazel.de) a large number of fixes and
the result of good amount of review has been added to the tree. All
bugs known to me have been fixed.

Fixes include:
* synchronous replication support
* don't peg the xmin for user tables, do it only for catalog ones.
* arbitrarily large transaction support by spilling large transactions to disk
* spill snapshots to disk, so we can restart without waiting for a new snapshot to be built
* Don't read all WAL from the establishment of a logical slot
* tests via SQL interface to changeset extraction

The todo list includes:
* morph the "logical slot" interface into being "replication slots" that can also be used by streaming replication
* move some more code from snapbuild.c to decode.c to remove a largely duplicated switch
* do some more header/comment cleanup & clarification
* move pg_receivellog into its own directory in src/bin or contrib/.
* user/developer level documentation

The patch series currently has two interfaces to logical decoding. One -
which is primarily useful for pg_regress style tests and playing around
- is SQL based, the other one uses a walsender replication connection.

A quick demonstration of the SQL interface (server needs to be started
with wal_level = logical and max_logical_slots > 0):
=# CREATE EXTENSION test_logical_decoding;
=# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding');   slotname     | xlog_position 
-----------------+---------------regression_slot | 0/17D5908
(1 row)

=# CREATE TABLE foo(id serial primary key, data text);

=# INSERT INTO foo(data) VALUES(1);

=# UPDATE foo SET id = -id, data = ':'||data;

=# DELETE FROM foo;

=# DROP TABLE foo;

=# SELECT * FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '0');location  | xid |
                     data
 
-----------+-----+--------------------------------------------------------------------------------0/17D59B8 | 695 |
BEGIN0/17D59B8| 695 | COMMIT0/17E8B58 | 696 | BEGIN0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1
data[text]:10/17E8B58| 696 | COMMIT0/17E8CA8 | 697 | BEGIN0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1
new-tuple:id[int4]:-1 data[text]::10/17E8CA8 | 697 | COMMIT0/17E8E50 | 698 | BEGIN0/17E8E50 | 698 | table "foo":
DELETE:id[int4]:-10/17E8E50 | 698 | COMMIT0/17E9058 | 699 | BEGIN0/17E9058 | 699 | COMMIT
 
(13 rows)

=# SELECT * FROM pg_stat_logical_decoding ;   slot_name    |    plugin     | database | active | xmin |
restart_decoding_lsn
 
-----------------+---------------+----------+--------+------+----------------------regression_slot | test_decoding |
12042| f      |  695 | 0/17D58D0
 
(1 row)

=# SELECT * FROM stop_logical_replication('regression_slot');stop_logical_replication
--------------------------                       0

The walsender interface has the same calls
INIT_LOGICAL_REPLICATION 'slot' 'plugin';
START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*];
STOP_LOGICAL_REPLICATION 'slot';

The only difference is that START_LOGICAL_REPLICATION can stream changes
and it can support synchronous replication.

The output seen in the 'data' column is produced by a so called 'output
plugin' which users of the facility can write to suit their needs. They
can be written by implementing 5 functions in the shared object that's
passed to init_logical_replication() above:
* pg_decode_init (optional)
* pg_decode_begin_txn
* pg_decode_change
* pg_decode_commit_txn
* pg_decode_cleanup (optional)

The most interesting function pg_decode_change get's passed a structure
containing old/new versions of the row, the 'struct Relation' belonging
to it and metainformation about the transaction.

The output plugin can rely on syscache lookups et al. to decode the
changed tuple in whatever fashion it wants.

I'd like to invite reviewers to first look at:
* the output plugin interface
* the walsender/SRF interface
* patch 12 which contains most of the code

When reading the code, the information flow during decoding might be
interesting:
---------------         +---------------+         | XLogReader    |         +---------------+                 |
 XLOG Records                 |                 v         +---------------+         | decode.c      |
+---------------+           |       |            |       |            v       |
 
+---------------+    |
| snapbuild.c   |  HeapTupleData
+---------------+    |            |       | catalog snapshots  |            |       |            v       v
+---------------+ |reorderbuffer.c|  +---------------+                |       HeapTuple & Metadata                |
          v         +---------------+  | Output Plugin |  +---------------+                |         Whatever you want
             |                v         +---------------+  | Output Handler|  |               |  |WalSnd or SRF  |
+---------------+
---------------


Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains     the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Poole
Дата:
Сообщение: stray SIGALRM
Следующее
От: Andres Freund
Дата:
Сообщение: Re: changeset generation v5-01 - Patches & git tree