Обсуждение: [GENERAL] pglogical vs. built-in logical replication in pg-10

Поиск
Список
Период
Сортировка

[GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andreas Joseph Krogh
Дата:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Achilleas Mantzios
Дата:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
You may do a simple test, create a table with a largeobject and try to read the logical stream, if it cannot represent the lo_import, lo_open, lowrite, lo_close (and I 'd bet they can't be encoded) then neither pglogical (being based on the same logical decoding technology) will support them.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andreas Joseph Krogh
Дата:
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
You may do a simple test, create a table with a largeobject and try to read the logical stream, if it cannot represent the lo_import, lo_open, lowrite, lo_close (and I 'd bet they can't be encoded) then neither pglogical (being based on the same logical decoding technology) will support them
 
 
The point of email-lists like this is that one may share knowledge so one doesn't have to test everything one self, and can build on knowledge from others. I'm looking for an answer from someone who's not betting, but knows.
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Achilleas Mantzios
Дата:
On 22/06/2017 13:38, Andreas Joseph Krogh wrote:
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
You may do a simple test, create a table with a largeobject and try to read the logical stream, if it cannot represent the lo_import, lo_open, lowrite, lo_close (and I 'd bet they can't be encoded) then neither pglogical (being based on the same logical decoding technology) will support them
 
 
The point of email-lists like this is that one may share knowledge so one doesn't have to test everything one self, and can build on knowledge from others. I'm looking for an answer from someone who's not betting, but knows.
I gave you enough knowledge already. Here's some more :
- go and install 10
- create a table containing one col with type oid (large object) and one bytea
- follow the simple setup here : https://www.postgresql.org/docs/10/static/logicaldecoding-example.html
- insert a row
- Do again : SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);

Do you see any of your oid image data in the output? Do you see any of the bytea ? (the answer here in 9.5 is  "no"/"yes").
If in 10.0 is still the case, then you should think about moving to bytea.

 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andreas Joseph Krogh
Дата:
På torsdag 22. juni 2017 kl. 15:25:20, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 13:38, Andreas Joseph Krogh wrote:
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
You may do a simple test, create a table with a largeobject and try to read the logical stream, if it cannot represent the lo_import, lo_open, lowrite, lo_close (and I 'd bet they can't be encoded) then neither pglogical (being based on the same logical decoding technology) will support them
 
 
The point of email-lists like this is that one may share knowledge so one doesn't have to test everything one self, and can build on knowledge from others. I'm looking for an answer from someone who's not betting, but knows.
I gave you enough knowledge already. Here's some more :
- go and install 10
- create a table containing one col with type oid (large object) and one bytea
- follow the simple setup here : https://www.postgresql.org/docs/10/static/logicaldecoding-example.html
- insert a row
- Do again : SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);

Do you see any of your oid image data in the output? Do you see any of the bytea ? (the answer here in 9.5 is  "no"/"yes").
If in 10.0 is still the case, then you should think about moving to bytea.
 
Hm, it turns out it's not quite that simple...
 
Test-case:
 
create table drus(id bigint primary key, lo oid, data bytea);
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
INSERT INTO drus (id, lo, data) values(1, lo_import('/tmp/faktura_200007.pdf'), decode('AAAEEE', 'hex'));

select * from drus;
┌────┬─────────┬──────────┐
│ id │   lo    │   data   │
├────┼─────────┼──────────┤
│  1 │ 2873269 │ \xaaaeee │
└────┴─────────┴──────────┘



SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);
┌────────────┬──────┬────────────────────────────────────────────────────────────────────────────────┐
│    lsn     │ xid  │                                      data                                      │
├────────────┼──────┼────────────────────────────────────────────────────────────────────────────────┤
│ B/E585B858 │ 9391 │ BEGIN 9391                                                                     │
│ B/E586BE78 │ 9391 │ table public.drus: INSERT: id[bigint]:1 lo[oid]:2873269 data[bytea]:'\xaaaeee' │
│ B/E586BF80 │ 9391 │ COMMIT 9391                                                                    │
└────────────┴──────┴────────────────────────────────────────────────────────────────────────────────┘
(3 rows)


 
So far so good, the oid-value (2873269) is apparently in the change-set, but...
 
Set up publication:
CREATE PUBLICATION bolle FOR ALL TABLES;
CREATE PUBLICATION


=== ON REPLICA ===
 
# create table on replica:
create table drus(id bigint primary key, lo oid, data bytea);
 
# create subscription:
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5433 user=andreak dbname=fisk' PUBLICATION bolle;
NOTICE:  created replication slot "mysub" on publisher
CREATE SUBSCRIPTION
2017-06-22 16:38:34.740 CEST [18718] LOG:  logical replication apply worker for subscription "mysub" has started
2017-06-22 16:38:34.747 CEST [18720] LOG:  logical replication table synchronization worker for subscription "mysub", table "drus" has started
2017-06-22 16:38:35.746 CEST [18720] LOG:  logical replication table synchronization worker for subscription "mysub", table "drus" has finished


Looks good:
select * from drus;
┌────┬─────────┬──────────┐
│ id │   lo    │   data   │
├────┼─────────┼──────────┤
│  1 │ 2873269 │ \xaaaeee │
└────┴─────────┴──────────┘
(1 row)


 
...until :
 
SELECT lo_export(drus.lo, '/tmp/faktura.pdf') from drus where id = 1;
2017-06-22 16:40:04.967 CEST [18657] ERROR:  large object 2873269 does not exist
2017-06-22 16:40:04.967 CEST [18657] STATEMENT:  SELECT lo_export(drus.lo, '/tmp/faktura.pdf') from drus where id = 1;
ERROR:  large object 2873269 does not exist


So, the OID-value is replicated but pg_largeobject is empty:
 
select * from pg_largeobject;
┌──────┬────────┬──────┐
│ loid │ pageno │ data │
├──────┼────────┼──────┤
└──────┴────────┴──────┘
(0 rows)


 
Once again having pg_largeobject as a system-catalog prevents LOs from working smoothly. Neither replication nor having LOs on a different tablespace (by moving pg_largeobject) works.
 
I wish PG in some future version will address these quirks so one can operate on LOs more smoothly.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Achilleas Mantzios
Дата:
On 22/06/2017 17:46, Andreas Joseph Krogh wrote:
På torsdag 22. juni 2017 kl. 15:25:20, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 13:38, Andreas Joseph Krogh wrote:
På torsdag 22. juni 2017 kl. 11:43:02, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
Hi.
 
1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
It seems pglogical is more feature-rich...
2. As I understand built-in logical replication in pg-10 doesn't support large-objects, which we use a lot. Does pglogical replicate large objects? I cannot find any notes about large-objects under "Limitations and Restrictions": https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/
You may do a simple test, create a table with a largeobject and try to read the logical stream, if it cannot represent the lo_import, lo_open, lowrite, lo_close (and I 'd bet they can't be encoded) then neither pglogical (being based on the same logical decoding technology) will support them
 
 
The point of email-lists like this is that one may share knowledge so one doesn't have to test everything one self, and can build on knowledge from others. I'm looking for an answer from someone who's not betting, but knows.
I gave you enough knowledge already. Here's some more :
- go and install 10
- create a table containing one col with type oid (large object) and one bytea
- follow the simple setup here : https://www.postgresql.org/docs/10/static/logicaldecoding-example.html
- insert a row
- Do again : SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);

Do you see any of your oid image data in the output? Do you see any of the bytea ? (the answer here in 9.5 is  "no"/"yes").
If in 10.0 is still the case, then you should think about moving to bytea.
 
Hm, it turns out it's not quite that simple...
 
Test-case:
 
create table drus(id bigint primary key, lo oid, data bytea);
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
INSERT INTO drus (id, lo, data) values(1, lo_import('/tmp/faktura_200007.pdf'), decode('AAAEEE', 'hex'));

select * from drus;
┌────┬─────────┬──────────┐
│ id │   lo    │   data   │
├────┼─────────┼──────────┤
│  1 │ 2873269 │ \xaaaeee │
└────┴─────────┴──────────┘



SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);
┌────────────┬──────┬────────────────────────────────────────────────────────────────────────────────┐
│    lsn     │ xid  │                                      data                                      │
├────────────┼──────┼────────────────────────────────────────────────────────────────────────────────┤
│ B/E585B858 │ 9391 │ BEGIN 9391                                                                     │
│ B/E586BE78 │ 9391 │ table public.drus: INSERT: id[bigint]:1 lo[oid]:2873269 data[bytea]:'\xaaaeee' │
│ B/E586BF80 │ 9391 │ COMMIT 9391                                                                    │
└────────────┴──────┴────────────────────────────────────────────────────────────────────────────────┘
(3 rows)


 
So far so good, the oid-value (2873269) is apparently in the change-set, but...
 
If the data itself of the LO are not there then this is not so good.
Set up publication:
CREATE PUBLICATION bolle FOR ALL TABLES;
CREATE PUBLICATION


=== ON REPLICA ===
 
# create table on replica:
create table drus(id bigint primary key, lo oid, data bytea);
 
# create subscription:
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5433 user=andreak dbname=fisk' PUBLICATION bolle;
NOTICE:  created replication slot "mysub" on publisher
CREATE SUBSCRIPTION
2017-06-22 16:38:34.740 CEST [18718] LOG:  logical replication apply worker for subscription "mysub" has started
2017-06-22 16:38:34.747 CEST [18720] LOG:  logical replication table synchronization worker for subscription "mysub", table "drus" has started
2017-06-22 16:38:35.746 CEST [18720] LOG:  logical replication table synchronization worker for subscription "mysub", table "drus" has finished


Looks good:
select * from drus;
┌────┬─────────┬──────────┐
│ id │   lo    │   data   │
├────┼─────────┼──────────┤
│  1 │ 2873269 │ \xaaaeee │
└────┴─────────┴──────────┘
(1 row)


 
...until :
 
SELECT lo_export(drus.lo, '/tmp/faktura.pdf') from drus where id = 1;
2017-06-22 16:40:04.967 CEST [18657] ERROR:  large object 2873269 does not exist
2017-06-22 16:40:04.967 CEST [18657] STATEMENT:  SELECT lo_export(drus.lo, '/tmp/faktura.pdf') from drus where id = 1;
ERROR:  large object 2873269 does not exist


So, the OID-value is replicated but pg_largeobject is empty:
 
select * from pg_largeobject;
┌──────┬────────┬──────┐
│ loid │ pageno │ data │
├──────┼────────┼──────┤
└──────┴────────┴──────┘
(0 rows)


 
Once again having pg_largeobject as a system-catalog prevents LOs from working smoothly. Neither replication nor having LOs on a different tablespace (by moving pg_largeobject) works.
 
I think logical decoding was designed for supporting DML SQL commands (i.e. a finite set of commands) and not specific functions (lo_*) which by nature can be arbitrary, infinite and version specific.
I wish PG in some future version will address these quirks so one can operate on LOs more smoothly.
What's so better in LO's VS bytea? You do a lot updates on the binary data, changing only parts of it?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Peter Eisentraut
Дата:
On 6/22/17 04:21, Andreas Joseph Krogh wrote:
> 1. Why should one prefer built-in logical replication in pg-10 to
> pglogical, does it do anything pglogical doesn't?
> It seems pglogical is more feature-rich...

You are right that pglogical has more functionality.  Much of that
functionality can be expected to trickle into core eventually.

One advantage of the in-core feature is that the initial table
synchronization can be parallelized, which can make the initial setup
faster and more robust.  pglogical will probably support that too at
some point once PG10 is out.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andres Freund
Дата:
On 2017-06-22 12:43:02 +0300, Achilleas Mantzios wrote:
> On 22/06/2017 11:21, Andreas Joseph Krogh wrote:
> > Hi.
> > 1. Why should one prefer built-in logical replication in pg-10 to pglogical, does it do anything pglogical doesn't?
> > It seems pglogical is more feature-rich...
> > 2. As I understand built-in logical replication in pg-10 doesn't support
> > large-objects, which we use a lot. Does pglogical replicate large
> > objects? I cannot find any notes about large-objects under "Limitations
> > and Restrictions":
> > https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/

> You may do a simple test, create a table with a largeobject and try to read
> the logical stream, if it cannot represent the lo_import, lo_open, lowrite,
> lo_close (and I 'd bet they can't be encoded) then neither pglogical (being
> based on the same logical decoding technology) will support them.

There's nothing fundamental preventing us supporting large objects , but
indeed logical decoding at the moment doesn't support it.  The technical
bits of extracting changes should actually be easy, it's a bit more
difficult to decide how to represent it to output plugins...

Greetings,

Andres Freund


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andres Freund
Дата:
On 2017-06-22 18:10:40 +0300, Achilleas Mantzios wrote:
> > Once again having pg_largeobject as a system-catalog prevents LOs
> > from working smoothly. Neither replication nor having LOs on a
> > different tablespace (by moving pg_largeobject) works.

> I think logical decoding was designed for supporting DML SQL commands
> (i.e. a finite set of commands) and not specific functions (lo_*)
> which by nature can be arbitrary, infinite and version specific.

That's not really the reason. The first reason its currently unsupported
is that LOs are stored in a system catalog, and currently all system
catalogs are excluded from the change stream.  The second problem is how
exactly to represent the changes - we can't represent it as the whole LO
being changed, as that'd increase the volume of WAL and replicated
writes dramatically.  Thus we need to invent an API that can represent
creation, deletion, and writes to arbitrary offsets, for output plugins.


> > I wish PG in some future version will address these quirks so one can operate on LOs more smoothly.

You're welcome to help...


Greetings,

Andres Freund


Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andreas Joseph Krogh
Дата:
På torsdag 22. juni 2017 kl. 19:30:49, skrev Andres Freund <andres@anarazel.de>:
On 2017-06-22 18:10:40 +0300, Achilleas Mantzios wrote:
> > Once again having pg_largeobject as a system-catalog prevents LOs
> > from working smoothly. Neither replication nor having LOs on a
> > different tablespace (by moving pg_largeobject) works.

> I think logical decoding was designed for supporting DML SQL commands
> (i.e. a finite set of commands) and not specific functions (lo_*)
> which by nature can be arbitrary, infinite and version specific.

That's not really the reason. The first reason its currently unsupported
is that LOs are stored in a system catalog, and currently all system
catalogs are excluded from the change stream.  The second problem is how
exactly to represent the changes - we can't represent it as the whole LO
being changed, as that'd increase the volume of WAL and replicated
writes dramatically.  Thus we need to invent an API that can represent
creation, deletion, and writes to arbitrary offsets, for output plugins.
 
pg_largeobject being a system catalog is the cause of much pain as I've found out. It also prevents moving it to a separate tablespace and maintaining pg_upgrade compatibility.
 
The first initiative would (possibly) be coming up with a new model for storing LOs, not involving system-catalogs. But, pg_largeobject doesn't seem all that magic and appears to contain "logical-decoding compatible" column-types:
 
\d pg_largeobject
         Table "pg_catalog.pg_largeobject"
┌────────┬─────────┬───────────┬──────────┬─────────┐
│ Column │  Type   │ Collation │ Nullable │ Default │
├────────┼─────────┼───────────┼──────────┼─────────┤
│ loid   │ oid     │           │ not null │         │
│ pageno │ integer │           │ not null │         │
│ data   │ bytea   │           │ not null │         │
└────────┴─────────┴───────────┴──────────┴─────────┘


If this was a regular table there would be nothing preventing it from being replicated successfully using logical decoding, correct?
 
 
> > I wish PG in some future version will address these quirks so one can operate on LOs more smoothly.

You're welcome to help...
 
Every time issues arise regarding LOs there seems to be little interest to improve matters, and if it doesn't itch....
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Andreas Joseph Krogh
Дата:
På torsdag 22. juni 2017 kl. 17:10:40, skrev Achilleas Mantzios <achill@matrix.gatewaynet.com>:
[snip]
What's so better in LO's VS bytea? You do a lot updates on the binary data, changing only parts of it?
 
BYTEA really sucks when dealing with large objects and streaming to clients (JDBC). The only solution (I've found) which doesn't explode in memory-usage is using LOs (OIDs). Note that we're dealing with multi-gigabytes objects, which need to be "transactional safe" (hence stored in the DB).
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Вложения

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

От
Achilleas Mantzios
Дата:
On 22/06/2017 20:30, Andres Freund wrote:
> On 2017-06-22 18:10:40 +0300, Achilleas Mantzios wrote:
>>> Once again having pg_largeobject as a system-catalog prevents LOs
>>> from working smoothly. Neither replication nor having LOs on a
>>> different tablespace (by moving pg_largeobject) works.
>> I think logical decoding was designed for supporting DML SQL commands
>> (i.e. a finite set of commands) and not specific functions (lo_*)
>> which by nature can be arbitrary, infinite and version specific.
> That's not really the reason. The first reason its currently unsupported
> is that LOs are stored in a system catalog, and currently all system
> catalogs are excluded from the change stream.  The second problem is how
> exactly to represent the changes - we can't represent it as the whole LO
> being changed, as that'd increase the volume of WAL and replicated
> writes dramatically.  Thus we need to invent an API that can represent
> creation, deletion, and writes to arbitrary offsets, for output plugins.

Thanx for the insight.

>
>>> I wish PG in some future version will address these quirks so one can operate on LOs more smoothly.
> You're welcome to help...
>
>
> Greetings,
>
> Andres Freund
>
>

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt