Обсуждение: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      17055
Logged by:          Sergey Bernikov
Email address:      sbernikov@gmail.com
PostgreSQL version: 13.3
Operating system:   Ubuntu 18.04.4
Description:

This error happens when logical replication target is partitioned table.

Steps to reproduce:
1. in source DB: create table and add to publication
    create table test_replication (
      id int not null,
      value varchar(100),
      primary key (id)
    );
create publication test_publication for table test_replication;

2. in target DB: create partitioned table and start replication
    create table test_replication (
      id int not null,
      value varchar(100),
      primary key (id)
    ) partition by range (id);
    create table test_replication_p_1 partition of test_replication
       for values from (0) to (10);
    create table test_replication_p_2 partition of test_replication
       for values from (10) to (20);
 
    create subscription test_subscription CONNECTION '...' publication
test_publication;
 
3. in source DB: insert data
    insert into test_replication(id, value) values (1, 'a1');

4. in target DB: delete data
    delete from test_replication where id = 1;

5. in source DB: update data
    update test_replication set value = 'a2' where id = 1;

Result: logical replication worker on target server crashes with different
error message:

Fragment of log output:
2021-06-07 19:51:13.480 +06 [82800] DEBUG:  logical replication did not find
row for update in replication target relation "test_replication_p_1"
2021-06-07 19:51:13.491 +06 [82800] ERROR:  could not access status of
transaction 1226709024
2021-06-07 19:51:13.491 +06 [82800] DETAIL:  Could not open file
"pg_subtrans/491E": No such file or directory.
2021-06-07 19:51:13.493 +06 [82797] DEBUG:  starting logical replication
worker for subscription "test_subscription"
2021-06-07 19:51:13.493 +06 [80587] DEBUG:  registering background worker
"logical replication worker for subscription 781420"
2021-06-07 19:51:13.493 +06 [80587] DEBUG:  starting background worker
process "logical replication worker for subscription 781420"
2021-06-07 19:51:13.501 +06 [28356] LOG:  logical replication apply worker
for subscription "test_subscription" has started
2021-06-07 19:51:13.501 +06 [28356] DEBUG:  connecting to publisher using
connection string "....."
2021-06-07 19:51:13.508 +06 [80587] DEBUG:  unregistering background worker
"logical replication worker for subscription 781420"
2021-06-07 19:51:13.508 +06 [80587] LOG:  background worker "logical
replication worker" (PID 82800) exited with exit code 1
2021-06-07 19:51:13.563 +06 [28356] DEBUG:  logical replication did not find
row for update in replication target relation "test_replication_p_1"
2021-06-07 19:52:31.575 +06 [80587] LOG:  background worker "logical
replication worker" (PID 28356) was terminated by signal 11: Segmentation
fault
2021-06-07 19:52:31.575 +06 [80587] LOG:  terminating any other active
server processes


PG Bug reporting form <noreply@postgresql.org> writes:
> This error happens when logical replication target is partitioned table.

Thanks for the report!  This one reproduces on HEAD as well as v13.

Also, people should be aware that this one causes the logrep worker
to crash repeatedly, which will fill your disk with core files
quite quickly.

            regards, tom lane



On Fri, Jun 11, 2021 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > This error happens when logical replication target is partitioned table.
>
> Thanks for the report!  This one reproduces on HEAD as well as v13.

+1, confirmed here too.

> Also, people should be aware that this one causes the logrep worker
> to crash repeatedly, which will fill your disk with core files
> quite quickly.

Thanks for this tip.

The problem is that apply_handle_tuple_routing() marches on as if it
could go ahead and update a tuple that it was just told doesn't exist.

Attached fixes the problem for me.  Not sure if adding a test case makes sense.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения
On Fri, Jun 11, 2021 at 02:56:58PM +0900, Amit Langote wrote:
> Attached fixes the problem for me.  Not sure if adding a test case makes sense.

It seems to me that it would make sense to have some coverage.  We've
caught issues in the past in this area, particularly with buildfarm
animals running CLOBBER_CACHE_ALWAYS.
--
Michael

Вложения
On Fri, Jun 11, 2021 at 3:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Jun 11, 2021 at 02:56:58PM +0900, Amit Langote wrote:
> > Attached fixes the problem for me.  Not sure if adding a test case makes sense.
>
> It seems to me that it would make sense to have some coverage.  We've
> caught issues in the past in this area, particularly with buildfarm
> animals running CLOBBER_CACHE_ALWAYS.

I thought about it for a minute when writing the patch I posted, but
wasn't really sure what the test case would look like.  The only
external proof that the case worked correctly, AFAICS, is the DEBUG1
message that's written to the log.  How do we add a test case with
that?  I know you mentioned an idea before to check/grep the log for
such cases.  Is that how?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Amit Langote <amitlangote09@gmail.com> writes:
> On Fri, Jun 11, 2021 at 3:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>> It seems to me that it would make sense to have some coverage.  We've
>> caught issues in the past in this area, particularly with buildfarm
>> animals running CLOBBER_CACHE_ALWAYS.

> I thought about it for a minute when writing the patch I posted, but
> wasn't really sure what the test case would look like.  The only
> external proof that the case worked correctly, AFAICS, is the DEBUG1
> message that's written to the log.  How do we add a test case with
> that?  I know you mentioned an idea before to check/grep the log for
> such cases.  Is that how?

I see from the coverage report that *none* of the did-not-find-tuple
code paths in worker.c are exercised.  This does not seem OK.  I agree
that probably the only way to have a test case is to kick up the debug
level to DEBUG1 and grep the log to confirm that the message is there.
IIRC, we already have infrastructure for grepping the log, so this
shouldn't be that hard.  Will work on it today.

(Meanwhile, thanks for the diagnosis and patch!)

            regards, tom lane



I wrote:
> I see from the coverage report that *none* of the did-not-find-tuple
> code paths in worker.c are exercised.  This does not seem OK.  I agree
> that probably the only way to have a test case is to kick up the debug
> level to DEBUG1 and grep the log to confirm that the message is there.
> IIRC, we already have infrastructure for grepping the log, so this
> shouldn't be that hard.  Will work on it today.

Done now.  It was a highly worthwhile exercise, too, because I stumbled
over two *other* bugs in this code while I was at it.

BTW, I intentionally reworded the DEBUG messages to make sure we could
tell the partitioned case apart from the non-partitioned one.

            regards, tom lane



On Fri, Jun 11, 2021 at 04:15:20PM -0400, Tom Lane wrote:
> Done now.  It was a highly worthwhile exercise, too, because I stumbled
> over two *other* bugs in this code while I was at it.
>
> BTW, I intentionally reworded the DEBUG messages to make sure we could
> tell the partitioned case apart from the non-partitioned one.

I have two comments about the TAP portion of ab55d74.

+$node_subscriber->append_conf('postgresql.conf', "log_min_messages = debug1");
+$node_subscriber->reload;
Wouldn't it be better to make sure that the change is effective before
moving on to the end of the test?

+my $logfile = slurp_file($node_subscriber->logfile());
+ok( $logfile =~
+     qr/logical replication did not find row to be updated in replication target relation "tab_full_pk"/,
+   'update target row is missing');
I think that it would be better here to grab the log location before
running the queries that will be matched with the logs, then pass down
the location to slurp_file().
--
Michael

Вложения
Michael Paquier <michael@paquier.xyz> writes:
> +$node_subscriber->append_conf('postgresql.conf', "log_min_messages = debug1");
> +$node_subscriber->reload;
> Wouldn't it be better to make sure that the change is effective before
> moving on to the end of the test?

AFAICS, none of the other callers of sub reload are doing anything
special to wait for it to take effect.  Not sure why this one
would need more protection.  If we do need a fix for that, likely
sub reload itself had better do it.

> +my $logfile = slurp_file($node_subscriber->logfile());
> +ok( $logfile =~
> +     qr/logical replication did not find row to be updated in replication target relation "tab_full_pk"/,
> +   'update target row is missing');
> I think that it would be better here to grab the log location before
> running the queries that will be matched with the logs, then pass down
> the location to slurp_file().

This coding is also stolen verbatim from other test scripts.
What is your concern about it exactly?  It's not like the
tests have such varying amounts of log output that an unexpected
log rotation would be likely.  (Does the default TAP setup even
allow log rotation at all?  I doubt it.)

            regards, tom lane



On Fri, Jun 11, 2021 at 08:31:30PM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> AFAICS, none of the other callers of sub reload are doing anything
> special to wait for it to take effect.  Not sure why this one
> would need more protection.  If we do need a fix for that, likely
> sub reload itself had better do it.

Perhaps.

>> +my $logfile = slurp_file($node_subscriber->logfile());
>> +ok( $logfile =~
>> +     qr/logical replication did not find row to be updated in replication target relation "tab_full_pk"/,
>> +   'update target row is missing');
>> I think that it would be better here to grab the log location before
>> running the queries that will be matched with the logs, then pass down
>> the location to slurp_file().
>
> This coding is also stolen verbatim from other test scripts.
> What is your concern about it exactly?

My main concern are future changes in this test suite that could cause
more queries to generate logs that overlap with this sequence of
tests, falsifying what this test is checking.

> It's not like the
> tests have such varying amounts of log output that an unexpected
> log rotation would be likely.  (Does the default TAP setup even
> allow log rotation at all?  I doubt it.)

The default TAP configuration does not use log rotation, because it
causes complications in the test code as it makes necessary wait
phases to make sure that the logs have rotated.  That's also more
costly than just getting the log position, run the queries and
slurp()'ing the file from the previous position.
--
Michael

Вложения
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Jun 11, 2021 at 08:31:30PM -0400, Tom Lane wrote:
>> AFAICS, none of the other callers of sub reload are doing anything
>> special to wait for it to take effect.  Not sure why this one
>> would need more protection.  If we do need a fix for that, likely
>> sub reload itself had better do it.

> Perhaps.

I looked around a bit, and I think that we don't need to sweat
about this.  Certainly we've not identified any buildfarm failures
related to reload not happening fast enough.  I think the sequence
of events is:

* pg_ctl will deliver the SIGHUP signal to the postmaster before
returning.

* Although surely the postmaster might not process that signal
instantly, we can expect that it will do so before it accepts
another connection request.  Since the tests cannot detect
whether the reload has happened without issuing a new command
(which, in all these tests, involves launching psql or pgbench
or pg_basebackup or what-have-you, which needs a new connection),
the reload will appear to have happened so far as the postmaster
itself is concerned, and therefore also in any new backend it
spawns.

* This does seem to leave in question whether a pre-existing
background process such as a logrep worker will get the word fast
enough.  However, I think the same principle applies there.  The
postmaster will have turned around and signaled the worker process
while it processes the SIGHUP for itself.  While that again doesn't
ensure instantaneous update of the worker's state, we can expect
that it will notice the SIGHUP before noticing any newly-arrived
replication data, which is Good Enough.  This argument does assume
that we know the worker is idle when we issue the reload ... but
we already waited for it to be idle.

Perhaps the buildfarm will prove me wrong, but I think we don't
have to do anything.  And if we did want to do something, what
would it be?  There is no feedback mechanism to let us observe
when the worker processed the signal.

>> This coding is also stolen verbatim from other test scripts.
>> What is your concern about it exactly?

> My main concern are future changes in this test suite that could cause
> more queries to generate logs that overlap with this sequence of
> tests, falsifying what this test is checking.

Oh, I see, you wonder if there could already be entries like these
in the file.  But since we only turned the log level up for the
duration of the specific test, it doesn't seem like a problem.
(I did think of this to the extent of making sure the second test
group in 013_partition.pl looks for different messages than the
first group does.)

            regards, tom lane



On Fri, Jun 11, 2021 at 09:14:01PM -0400, Tom Lane wrote:
> Perhaps the buildfarm will prove me wrong, but I think we don't
> have to do anything.  And if we did want to do something, what
> would it be?  There is no feedback mechanism to let us observe
> when the worker processed the signal.

I am fine to do nothing here unless the buildfarm says something.

> Oh, I see, you wonder if there could already be entries like these
> in the file.  But since we only turned the log level up for the
> duration of the specific test, it doesn't seem like a problem.

That's not a problem, now.

> (I did think of this to the extent of making sure the second test
> group in 013_partition.pl looks for different messages than the
> first group does.)

Still, the current style is not a pattern I think we should spread
around, as a lot of copy-paste gets done in new tests added and one
would easily mix up queries with pattern checks.  The practice I'd
like to encourage is something like the attached, to make clear that
there is no risk to fetch incorrect logs.
--
Michael

Вложения
On Sat, Jun 12, 2021 at 5:15 Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> I see from the coverage report that *none* of the did-not-find-tuple
> code paths in worker.c are exercised.  This does not seem OK.  I agree
> that probably the only way to have a test case is to kick up the debug
> level to DEBUG1 and grep the log to confirm that the message is there.
> IIRC, we already have infrastructure for grepping the log, so this
> shouldn't be that hard.  Will work on it today.

Done now.  It was a highly worthwhile exercise, too, because I stumbled
over two *other* bugs in this code while I was at it.

Oh, thanks for taking care of that.

BTW, I intentionally reworded the DEBUG messages to make sure we could
tell the partitioned case apart from the non-partitioned one.

That looks useful.
--
Michael Paquier <michael@paquier.xyz> writes:
> Still, the current style is not a pattern I think we should spread
> around, as a lot of copy-paste gets done in new tests added and one
> would easily mix up queries with pattern checks.  The practice I'd
> like to encourage is something like the attached, to make clear that
> there is no risk to fetch incorrect logs.

If you prefer.  However, I'd note that this patch as it stands
sort of breaks the argument I just made, as now you are interrogating
the postmaster's state after sending SIGHUP and before doing anything
that will wait for that to take effect.  I think you should move the
location-capture commands down by at least one safe_psql call.

            regards, tom lane



On Fri, Jun 11, 2021 at 09:37:05PM -0400, Tom Lane wrote:
> If you prefer.  However, I'd note that this patch as it stands
> sort of breaks the argument I just made, as now you are interrogating
> the postmaster's state after sending SIGHUP and before doing anything
> that will wait for that to take effect.  I think you should move the
> location-capture commands down by at least one safe_psql call.

Ah, right.  That would be after "DELETE FROM tab_full_pk" in
001_rep_changes.pl and after "DELETE FROM tab1" in 001_rep_changes.pl
run on the subscribers, then.  Would you prefer doing the change?  If
you don't think that's not worth it, I won't fight hard on it, but
while we're dealing with it..
--
Michael

Вложения
Michael Paquier <michael@paquier.xyz> writes:
> Ah, right.  That would be after "DELETE FROM tab_full_pk" in
> 001_rep_changes.pl and after "DELETE FROM tab1" in 001_rep_changes.pl
> run on the subscribers, then.  Would you prefer doing the change?

No, go right ahead.

            regards, tom lane



On Fri, Jun 11, 2021 at 09:51:13PM -0400, Tom Lane wrote:
> No, go right ahead.

Okay, thanks.  Done this way for all three tests, with explanations of
the how and the why.
--
Michael

Вложения