pgbench on pglogical: bandwidth usage

Поиск
Список
Период
Сортировка
От Fabio Pardi
Тема pgbench on pglogical: bandwidth usage
Дата
Msg-id 08bdbb29-6e85-d07f-787f-5fe2e544831a@portavita.eu
обсуждение исходный текст
Список pgsql-general
Hi,

I'm researchingon pglogical,performing tests to see the impact on the network traffic, in comparisonwith streaming
replication.

I configured one provider and one subscriber both running on a VM, both on Postgres 9.6.3 and latest pglogical 2.2.0. 

Forcomparison, Ialso have one master and one hot standby, running on the same VM and on the same Postgres.

I performed 2 different kind of tests. One using pgbench and another manually generating data. But there is something I
donot understand when it comes to draw conclusions.Therefore I would appreciate yourhelp.
 


=Test 1=

I create a schema/table called:

my_replicated_schema.my_first_replicated_table

I ask to provider to add it to the replication set using:

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['my_replicated_schema']);

Then I generate 50 million +1 records on the table:

insert into my_replicated_schema.my_first_replicated_table values (generate_series(0,50000000));'


At this stage I can check how much traffic was generated to pass the records to the other node.

Logical replication costs:

         RX bytes:10271325 (9.7 MiB)  TX bytes:2465344586 (2.2 GiB)



While streaming replication:

         RX bytes:23058328 (21.9 MiB)  TX bytes:7502525286 (6.9 GiB)




Conclusion:

Pglogical is more convenient.

Nice.



= Test 2=

Same configuration used to test pglogical during test 1.Accidentally, Idid not change the replication set. So the
replicationset was still using: 'my_replicated_schema.my_first_replicated_table'
 

Pgbench instead writes to 'public'schema.


I theninitialize pgbench tables on the provider, using:

pgbench -i 

which results in:

 \dt public.
             List of relations
 Schema |       Name       | Type  | Owner
--------+------------------+-------+-------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)


I then run pgbench on the provider:


pgbench -c 3 -P 10  -r -T 200 -R 1000


And with big surprise, Ican thensee that an amount of traffic was generated:

          RX bytes:35500329 (33.8 MiB)  TX bytes:66376459 (63.3 MiB)

This is very strange to me. Running similar tests, where on the provider I manually push data on a table which is not
inthe replication set, no traffic was generated.
 

There must be an explanation for what is happening, and I think is more related to pgbench than pglogical, but i cannot
findit. Do you have pointers?
 



= Test 3 =

Provider is replicating the schema public, and subscriber receiving it.

I then run pgbenchsame way as above:

pgbench -c 3 -P 10  -r -T 200 -R 1000


But I get results which are contradicting test 1.

Pglogical replication uses:


         RX bytes:69783187 (66.5 MiB)  TX bytes:371664060 (354.4 MiB)


While streaming replication:


          RX bytes:86286353 (82.2 MiB)  TX bytes:281693815 (268.6 MiB)


Here, one would say that streaming replication is cheaper..


Also I cannot explain why is that. Do you?



Side notes:

- All tests ran over multiple times, producing identical results

- I also ran a test similar to 'Test1' but updating results (instead of running 'INSERT'), which as expected gave same
outcomeas 'Test 1'
 

- At every run Idestroy 'pgbench' database and recreate it, in order to start as clean as possible.

- As cross check, Im also checking that traffic wise,the numbers that appear on the provider are specular on the
subscriber

- Here Ireport the exact commands I ran in order to reproduce the test beds for pglogical:


Test 1 and 2:

Provider called 'dbfabio':

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db  -c "CREATE extension pglogical;"
psql $pglog_db  -c "CREATE schema my_replicated_schema"
psql $pglog_db  -c "CREATE table my_replicated_schema.my_first_replicated_table (i int primary key)"
psql $pglog_db  -c "SELECT pglogical.create_node(node_name := 'provider.$pglog_db', dsn := 'host=dbfabio port=5432
dbname=$pglog_db');"
psql $pglog_db  -c "SELECT pglogical.replication_set_add_all_tables('default', ARRAY['my_replicated_schema']);"


Subscriber called 'dbfabio2':

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db -c "CREATE extension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 'subscriber.$pglog_db', dsn := 'host=dbfabio2 port=5432
dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.create_subscription(subscription_name := 'subscription_to_dbfabio_$pglog_db',
synchronize_structure :=true, provider_dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
 



Test 3:

Provider:

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
pgbench -i 

# Now a small hack, since pglogical only accepts tables who have a primary key. pgbench_historical does not have it,
outof the box: (maybe here there is some room for an improvement to propose for pgbench code? what do you think?)
 

psql $pglog_db -c "ALTER TABLE pgbench_history ADD COLUMN id SERIAL PRIMARY KEY;"
psql $pglog_db -c "CREATEextension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 'provider.$pglog_db', dsn := 'host=dbfabio port=5432
dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);"


Subscriber:

Same as test 1



Streaming replication setup looks like:

Master:

wal_level = hot_standby

Standby:

hot_standby = on

+ recovery.conf



any help is appreciated.


regards,

fabio pardi



В списке pgsql-general по дате отправления:

Предыдущее
От: Abhinav Singh
Дата:
Сообщение: Replication using PGLogical
Следующее
От: Bruno Lavoie
Дата:
Сообщение: Re: Failed rpm package signature checks with reposync