Re: JSON decoding plugin

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: JSON decoding plugin
Дата
Msg-id CAHyXU0zE-kcofYtnoQqu7i2odH+L8zhfQz9-Ozbdga4NUXgBJg@mail.gmail.com
обсуждение исходный текст
Ответ на JSON decoding plugin  (Euler Taveira <euler@timbira.com.br>)
Ответы Re: JSON decoding plugin
Список pgsql-hackers
On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira <euler@timbira.com.br> wrote:
> Hi,
>
> A few months ago, it was proposed [1] that would be interested to have a
> json output plugin for logical decoding. Here it is.
>
> Each transaction is a JSON object that can contain xid (optional),
> timestamp (optional), and change array. Each change's element is a
> command that was decoded and it can contains: kind (I/U/D), schema
> (optional), table, columnnames, columntypes (optional), columnvalues,
> and oldkeys (only for U/D). columnnames, columntypes and columnvalues
> are arrays. oldkeys is an object that contains the following arrays:
> keynames, keytypes (optional), and keyvalues.
>
> The JSON objects are serialized if you are decoding a serie of
> transactions. Here is an output example:
>
> {
>         "xid": 702,
>         "change": [
>                 {
>                         "kind": "insert",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 2, "test"]
>                 }
>                 ,{
>                         "kind": "update",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 2, "test2"],
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 2]
>                         }
>                 }
>         ]
> }
> {
>         "xid": 703,
>         "change": [
>                 {
>                         "kind": "update",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 3, "test2"],
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 2]
>                         }
>                 }
>         ]
> }
> {
>         "xid": 704,
>         "change": [
>                 {
>                         "kind": "delete",
>                         "schema": "public",
>                         "table": "foo",
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 3]
>                         }
>                 }
>         ]
> }
>
>
> Some data types was adapted to conform with JSON spec. NAN and Infinity
> are not valid JSON symbols so their representation is NULL (as some JSON
> implementations). Due to JSON datatype simplicity, I represent the vast
> majority of Postgres datatypes as string (However, I admit that we could
> mimic the json datatype conversion rules).
>
> The oldkeys treatment follows what was defined by the commit [2]. It uses:
>
> (i) primary key (default behavior);
> (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
> (iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
> (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
> table).
>
> The TOAST columns have a special treatment for UPDATEs. If a tuple that
> contains a TOAST field is updated, the TOAST field is included iif it is
> changed too. It means that unchanged TOAST field are omitted from
> columns* arrays. This means less overhead while transmitting,
> processing and applying changes.
>
> By design, (i) output plugin doesn't know about aborted transactions and
> (ii) subtransactions are reordered into a toplevel transaction and only
> the committed pieces are passed to the plugin.
>
> You can test it firing the regression tests (e.g. 'make test') or using
> the following steps?
>
> postgresql.conf:
> wal_level = logical
> max_wal_senders = 2
> max_logical_slots = 2
>
> start collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout
> --plugin=json_decoding_plugin --init
>
> [execute some transactions]
>
> start printing decoded transactions:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start
>
> stop collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop
>
>
> Comments?

This is pretty neat.   Couple minor questions:
*) Aren't you *en*coding data into json, not the other way around (decoding?)
*) Consider generating a long bytea instead of explicitly writing a
32kb sql into the patch.
*) You've built your own json serializer here.  Maybe some code can be
shared with the json type?
*) Consider removing 'plugin ' from the name of the plugin.
--plugin=json_decoding etc.

merlin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Backup throttling
Следующее
От: Robert Haas
Дата:
Сообщение: Re: pg_archivecleanup bug