Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

Поиск
Список
Период
Сортировка
От Alexey Kondratov
Тема Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Дата
Msg-id 40c38758-04b5-74f4-c963-cf300f9e5dff@postgrespro.ru
обсуждение исходный текст
Ответ на Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi Tomas,

> This new version is mostly just a rebase to current master (or almost,
> because 2pc decoding only applies to 29180e5d78 due to minor bitrot),
> but it also addresses the new stuff committed since last version (most
> importantly decoding of TRUNCATE). It also fixes a bug in WAL-logging of
> subxact assignments, where the assignment was included in records with
> XID=0, essentially failing to track the subxact properly.

I started reviewing your patch about a month ago and tried to do an 
in-depth review, since I am very interested in this patch too. The new 
version is not applicable to master 29180e5d78, but everything is OK 
after applying 2pc patch before. Anyway, I guess it may complicate 
further testing and review, since any potential reviewer has to take 
into account both patches at once. Previous version was applicable to 
master and was working fine for me separately (excepting a few 
patch-specific issues, which I try to explain below).


Patch review
========

First of all, I want to say thank you for such a huge work done. Here 
are some problems, which I have found and hopefully fixed with my 
additional patch (please, find attached, it should be applicable to the 
last commit of your newest patch version):

1) The most important issue is that your tap tests were broken—there was 
missing option "WITH (streaming=true)" in the subscription creating 
statement. Therefore, spilling mechanism has been tested rather than 
streaming.

2) After fixing tests the first one with simple streaming is immediately 
failed, because of logical replication worker segmentation fault. It 
happens, since worker tries to call stream_cleanup_files inside 
stream_open_file at the stream start, while nxids is zero, then it goes 
to the negative value and everything crashes. Something similar may 
happen with xids array, so I added two checks there.

3) The next problem is much more critical and is dedicated to historic 
MVCC visibility rules. Previously, walsender was starting to decode 
transaction on commit and we were able to resolve all xmin, xmax, 
combocids to cmin/cmax, build tuplecids hash and so on, but now we start 
doing all these things on the fly.

Thus, rather difficult situation arises: HeapTupleSatisfiesHistoricMVCC 
is trying to validate catalog tuples, which are currently in the future 
relatively to the current decoder position inside transaction, e.g. we 
may want to resolve cmin/cmax of a tuple, which was created with cid 3 
and deleted with cid 5, while we are currently at cid 4, so our 
tuplecids hash is not complete to handle such a case.

I have updated HeapTupleSatisfiesHistoricMVCC visibility rules with two 
options:

/*
  * If we accidentally see a tuple from our transaction, but cannot 
resolve its
  * cmin, so probably it is from the future, thus drop it.
  */
if (!resolved)
     return false;

and

/*
  * If we accidentally see a tuple from our transaction, but cannot 
resolve its
  * cmax or cmax == InvalidCommandId, so probably it is still valid, 
thus accept it.
  */
if (!resolved || cmax == InvalidCommandId)
     return true;

4) There was a problem with marking top-level transaction as having 
catalog changes if one of its subtransactions has. It was causing a 
problem with DDL statements just after subtransaction start (savepoint), 
so data from new columns is not replicated.

5) Similar issue with schema send. You send schema only once per each 
sub/transaction (IIRC), while we have to update schema on each catalog 
change: invalidation execution, snapshot rebuild, adding new tuple cids. 
So I ended up with adding is_schema_send flag to ReorderBufferTXN, since 
it is easy to set it inside RB and read in the output plugin. Probably, 
we have to choose a better place for this flag.

6) To better handle all these tricky cases I added new tap 
test—014_stream_tough_ddl.pl—which consist of really tough combination 
of DDL, DML, savepoints and ROLLBACK/RELEASE in a single transaction.

I marked all my fixes and every questionable place with comment and 
"TOCHECK:" label for easy search. Removing of pretty any of these fixes 
leads to the tests fail due to the segmentation fault or replication 
mismatch. Though I mostly read and tested old version of patch, but 
after a quick look it seems that all these fixes are applicable to the 
new version of patch as well.


Performance
========

I have also performed a series of performance tests, and found that 
patch adds a huge overhead in the case of a large transaction consisting 
of many small rows, e.g.:

CREATE TABLE large_test (num1 bigint, num2 double precision, num3 double 
precision);

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_test (num1, num2, num3)
SELECT round(random()*10), random(), random()*142
FROM generate_series(1, 1000000) s(i);

Execution Time: 2407.709 ms
Total Time: 11494,238 ms (00:11,494)

With synchronous_standby_names and 64 MB logical_work_mem it takes up to 
x5 longer, while without patch it is about x2. Thus, logical replication 
streaming is approximately x4 as slower for similar transactions.

However, dealing with large transactions consisting of a small number of 
large rows is much better:

CREATE TABLE large_text (t TEXT);

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 1000000)) FROM generate_series(1, 125);

Execution Time: 3545.642 ms
Total Time: 7678,617 ms (00:07,679)

It is around the same x2 as without patch. If someone is interested I 
also added flamegraphs of walsender and logical replication worker 
during first numerical transaction processing.


Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Referential Integrity Checks with Statement-level Triggers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Why aren't we using strsignal(3) ?