Обсуждение: Simplify code building the LR conflict messages

Поиск

Список

Период

Сортировка

Simplify code building the LR conflict messages

От

Peter Smith

Дата:

28 ноября, 04:34:14

While reviewing another conflict-related thread, I noticed that the
existing conflict messages are currently getting built using
unexpected logic.

e.g
----------
if (tuple_value.len > 0)
{
  appendStringInfoString(&tuple_value, "; ");
  appendStringInfo(&tuple_value, _("existing local row %s"),
                   desc);
}
else
{
  appendStringInfo(&tuple_value, _("Existing local row %s"),
                   desc);
}
----------

I couldn't think of a reason why the "; " string needed to be
separated from the rest of the message like that. And when you combine
the strings, the logic easily collapses into a single statement with
less code and greater readability.

There were several code fragments like this. Here is a patch to modify them.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

v1-0001-Simplify-the-code-building-the-conflict-messages.patch

Re: Simplify code building the LR conflict messages

От

Tom Lane

Дата:

28 ноября, 05:49:18

Peter Smith <smithpb2250@gmail.com> writes:
> I couldn't think of a reason why the "; " string needed to be
> separated from the rest of the message like that. And when you combine
> the strings, the logic easily collapses into a single statement with
> less code and greater readability.

... and, probably, less ability of the compiler to verify that the
variadic arguments match the format string.  I think you've taken
this a bit too far.

            regards, tom lane

Re: Simplify code building the LR conflict messages

От

Peter Smith

Дата:

28 ноября, 06:49:10

On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Smith <smithpb2250@gmail.com> writes:
> > I couldn't think of a reason why the "; " string needed to be
> > separated from the rest of the message like that. And when you combine
> > the strings, the logic easily collapses into a single statement with
> > less code and greater readability.
>
> ... and, probably, less ability of the compiler to verify that the
> variadic arguments match the format string.  I think you've taken
> this a bit too far.
>

Hi Tom,

Thank you for the feedback. Could you please clarify which aspect is
of concern so I can address it properly:

* Is the concern about having a StringInfo format string that starts
with a semicolon? If so, I noticed there are already multiple similar
examples in the codebase (see: `grep -r . -e 'StringInfo(.*";.*%'`),
so I understood this pattern to be acceptable.

* Or is it the use of the ternary operator to select the format
string? If that's the issue, please note that my patch only introduced
the ternary operator for the first two code fragments. The third
fragment already uses ternaries in the same way on master, so I
understood that to be an established pattern as well.

I'd like to make sure I understand your concern correctly so I can
revise the patch appropriately.

======
Kind Regards,
Peter Smith
Fujitsu Australia

Re: Simplify code building the LR conflict messages

От

Tom Lane

Дата:

28 ноября, 07:27:53

Peter Smith <smithpb2250@gmail.com> writes:
> On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... and, probably, less ability of the compiler to verify that the
>> variadic arguments match the format string.  I think you've taken
>> this a bit too far.

> * Or is it the use of the ternary operator to select the format
> string? If that's the issue, please note that my patch only introduced
> the ternary operator for the first two code fragments. The third
> fragment already uses ternaries in the same way on master, so I
> understood that to be an established pattern as well.

Sadly, you were copying bad code that we need to fix.

> I'd like to make sure I understand your concern correctly so I can
> revise the patch appropriately.

My concern is that we follow a coding style that the compiler can
check.  Most C compilers can only verify that printf-like format
strings match the actual arguments if the format string is a constant.
So if I fat-finger the format string in this example:

--- a/src/backend/replication/logical/conflict.c
+++ b/src/backend/replication/logical/conflict.c
@@ -383,7 +383,7 @@ build_tuple_value_details(EState *estate, ResultRelInfo *relinfo,
            if (tuple_value.len > 0)
            {
                appendStringInfoString(&tuple_value, "; ");
-               appendStringInfo(&tuple_value, _("existing local row %s"),
+               appendStringInfo(&tuple_value, _("existing local row %d"),
                                 desc);
            }

I'll hear about it:

conflict.c: In function 'build_tuple_value_details':
conflict.c:386:38: warning: format '%d' expects argument of type 'int', but argument 3 has type 'char *' [-Wformat=]
     appendStringInfo(&tuple_value, _("existing local row %d"),
                                      ^~~~~~~~~~~~~~~~~~~~~~~
conflict.c:386:36: note: in expansion of macro '_'
     appendStringInfo(&tuple_value, _("existing local row %d"),
                                    ^

But I don't trust the compiler to see through a ternary expression
and check (both!) format strings against the actuals.

In a quick test, the gcc version I have handy does seem to be able to
do that, but I don't think we should rely on that.  Format strings
are something that is way too easy to get wrong, and most of us just
expect the compiler to cross-check them for us, so coding patterns
that might silently disable such compiler warnings are best avoided.

(There's also some magic going on here to allow the compiler to see
through gettext(), but it's quite magic.  We don't need to assume
multiple layers of magic will work reliably.)

            regards, tom lane

Re: Simplify code building the LR conflict messages

От

Peter Smith

Дата:

28 ноября, 08:14:49

On Fri, Nov 28, 2025 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Smith <smithpb2250@gmail.com> writes:
> > On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ... and, probably, less ability of the compiler to verify that the
> >> variadic arguments match the format string.  I think you've taken
> >> this a bit too far.
>
> > * Or is it the use of the ternary operator to select the format
> > string? If that's the issue, please note that my patch only introduced
> > the ternary operator for the first two code fragments. The third
> > fragment already uses ternaries in the same way on master, so I
> > understood that to be an established pattern as well.
>
> Sadly, you were copying bad code that we need to fix.
>
> > I'd like to make sure I understand your concern correctly so I can
> > revise the patch appropriately.
>
> My concern is that we follow a coding style that the compiler can
> check.  Most C compilers can only verify that printf-like format
> strings match the actual arguments if the format string is a constant.
> So if I fat-finger the format string in this example:
>
> --- a/src/backend/replication/logical/conflict.c
> +++ b/src/backend/replication/logical/conflict.c
> @@ -383,7 +383,7 @@ build_tuple_value_details(EState *estate, ResultRelInfo *relinfo,
>             if (tuple_value.len > 0)
>             {
>                 appendStringInfoString(&tuple_value, "; ");
> -               appendStringInfo(&tuple_value, _("existing local row %s"),
> +               appendStringInfo(&tuple_value, _("existing local row %d"),
>                                  desc);
>             }
>
> I'll hear about it:
>
> conflict.c: In function 'build_tuple_value_details':
> conflict.c:386:38: warning: format '%d' expects argument of type 'int', but argument 3 has type 'char *' [-Wformat=]
>      appendStringInfo(&tuple_value, _("existing local row %d"),
>                                       ^~~~~~~~~~~~~~~~~~~~~~~
> conflict.c:386:36: note: in expansion of macro '_'
>      appendStringInfo(&tuple_value, _("existing local row %d"),
>                                     ^
>
> But I don't trust the compiler to see through a ternary expression
> and check (both!) format strings against the actuals.
>
> In a quick test, the gcc version I have handy does seem to be able to
> do that, but I don't think we should rely on that.  Format strings
> are something that is way too easy to get wrong, and most of us just
> expect the compiler to cross-check them for us, so coding patterns
> that might silently disable such compiler warnings are best avoided.
>
> (There's also some magic going on here to allow the compiler to see
> through gettext(), but it's quite magic.  We don't need to assume
> multiple layers of magic will work reliably.)
>

Got it. Thanks for the detailed reason.

Here is a revised patch v2 that removes all ternaries from the format
string decision making (including the ones that were already in
master).

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

v2-0001-Simplify-format-strings-and-remove-ternaries.patch

Re: Simplify code building the LR conflict messages

От

Álvaro Herrera

Дата:

28 ноября, 16:08:33

Hello,

So, what we're doing here is to append further row-identifying details
to an errdetail string that already contains some explanation of the
problem.  That is, we have something like 

DETAIL:  The row to be updated was deleted.

and then we add whatever this function produces, after a newline.  I
don't remember any other place in our code that does such a thing.  The
result would look something like

DETAIL:  The row to be updated was deleted.
Key:  (1, 2); remote row (1,2,3); replica identity full (1,2,3).

or something like that, where each of the parts can begin with uppercase
or not, with a semicolon or not, depending on whether there are any
previous parts.

I wonder if it would make more sense to move this identifying
information to a CONTEXT line instead, where we won't have to care about
the uppercase.

We still have to worry about the semicolon though.  Maybe we should make
that explicit in the string but given control to the translator by doing
something like

  /* translator: first %s is either empty or a translated list
     separator, second %s is a row descriptor */
  appendStringInfo(&buf, "%sremote row %s", empty ? "" :
                   /* translator: this is used as a list separator */
                   _("; "), remote_row_description);

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
 algunas personas nos parecen brillantes un minuto antes
 de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)

Re: Simplify code building the LR conflict messages

От

Amit Kapila

Дата:

29 ноября, 10:12:27

On Fri, Nov 28, 2025 at 6:38 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> So, what we're doing here is to append further row-identifying details
> to an errdetail string that already contains some explanation of the
> problem.  That is, we have something like
>
> DETAIL:  The row to be updated was deleted.
>
> and then we add whatever this function produces, after a newline.  I
> don't remember any other place in our code that does such a thing.  The
> result would look something like
>
> DETAIL:  The row to be updated was deleted.
> Key:  (1, 2); remote row (1,2,3); replica identity full (1,2,3).
>
> or something like that, where each of the parts can begin with uppercase
> or not, with a semicolon or not, depending on whether there are any
> previous parts.
>
> I wonder if it would make more sense to move this identifying
> information to a CONTEXT line instead, where we won't have to care about
> the uppercase.
>

As per my understanding, we typically display the information in
context via errcontext() in error context callback functions which may
not be immediately available at the location where error happens. For
example, while applying the remote changes, the error can happen
anywhere in the heap or below layer, but we add additional apply
related information (like which apply_message was being processed when
error occurred) via apply_error_callback. So, it is not clear in the
case at hand whether we should display the information which is
available at error_site via context message.

--
With Regards,
Amit Kapila.

Re: Simplify code building the LR conflict messages

От

Tom Lane

Дата:

29 ноября, 20:30:45

=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> So, what we're doing here is to append further row-identifying details
> to an errdetail string that already contains some explanation of the
> problem.  That is, we have something like
> DETAIL:  The row to be updated was deleted.
> and then we add whatever this function produces, after a newline.  I
> don't remember any other place in our code that does such a thing.  The
> result would look something like
> DETAIL:  The row to be updated was deleted.
> Key:  (1, 2); remote row (1,2,3); replica identity full (1,2,3).
> or something like that, where each of the parts can begin with uppercase
> or not, with a semicolon or not, depending on whether there are any
> previous parts.

I looked into the postmaster log for subscription/t/035_conflicts.pl
to find actual examples of what this code produces, and now I think
it's got worse problems than whether the C code style is good.
The output is frankly a mess, and IMO it violates our message style
guidelines in multiple ways.  Here's a sampling:

2025-11-29 11:43:03.782 EST logical replication apply worker[659868] ERROR:  conflict detected on relation
"public.conf_tab":conflict=multiple_unique_conflicts 
2025-11-29 11:43:03.782 EST logical replication apply worker[659868] DETAIL:  Key already exists in unique index
"conf_tab_pkey",modified in transaction 770. 
    Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
    Key already exists in unique index "conf_tab_b_key", modified in transaction 770.
    Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
    Key already exists in unique index "conf_tab_c_key", modified in transaction 770.
    Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).

2025-11-29 11:43:04.654 EST logical replication apply worker[659907] LOG:  conflict detected on relation
"public.conf_tab":conflict=update_missing 
2025-11-29 11:43:04.654 EST logical replication apply worker[659907] DETAIL:  Could not find the row to be updated.
    Remote row (6, 7, 8); replica identity (a)=(5).

2025-11-29 11:43:05.266 EST logical replication apply worker[659942] LOG:  conflict detected on relation "public.tab":
conflict=delete_origin_differs
2025-11-29 11:43:05.266 EST logical replication apply worker[659942] DETAIL:  Deleting the row that was modified
locallyin transaction 790 at 2025-11-29 11:43:05.262807-05. 
    Existing local row (1, 3); replica identity (a)=(1).

Where to begin?

* "conflict detected" is already presuming way too much context for
someone who's looking into the postmaster log.  Here we can guess
what it's about because somebody configured this test with %b in
log_line_prefix.  But without the "logical replication apply worker"
cue, are users really going to know what these messages are about?

* The business about "conflict=some_kind_of_keyword" is certainly not
per our style guidelines, and I wonder how it translates.  (Checks
code ... looks like we don't even try.)

* The DETAIL messages are not complete sentences or even
approximations to them, and frankly they are just about unintelligible.

* I follow "Key already exists in unique index "foo" but not "modified
in transaction 770" --- what was modified, which side's XID is this,
and why do we think an XID will be of any help whatever?  The run-on
clause is very poor English in any case.  "Deleting the row that was
modified locally in transaction 790 at 2025-11-29 11:43:05.262807-05."
is better in that a transaction timestamp is much more likely to be
useful than an XID, and it's marginally better English, but it's still
not a complete sentence and it reads like context not detail.

* What am I to make of this:
"Key (a)=(55); existing local row (55, 2, 3); remote row (55, 2, 3)."
From context I guess it's trying to show me the entire rows not just
the duplicated key value, but they're not labeled adequately.
Besides, I wonder how well this output scales to non-toy tables where
the rows are kilobytes or megabytes long.  Index keys can be safely
assumed to not be enormously long, but it doesn't follow that it's
sane to emit the entire row.  Much less several of them in one
error message.

* The three examples I show above look like the detail messages were
written by three different people who were not talking to each other.
The initial kind-of-a-sentence part is totally different style in
each case.  And why do some of the outputs put the labeled key first
and others put it last?

So I think this area desperately needs significant editorial
attention, as well as some fundamental rethinking of just what
information we should show.  Perhaps using errcontext would help,
but I'm not sure.  I think a large part of the problem stems from
trying to cram multiple error conditions into one ereport ... is it
possible to avoid that?

            regards, tom lane

Re: Simplify code building the LR conflict messages

От

Tom Lane

Дата:

29 ноября, 21:21:00

I wrote:
> So I think this area desperately needs significant editorial
> attention, as well as some fundamental rethinking of just what
> information we should show.  Perhaps using errcontext would help,
> but I'm not sure.  I think a large part of the problem stems from
> trying to cram multiple error conditions into one ereport ... is it
> possible to avoid that?

After a little bit of thought, here's a sketch of a straw-man idea.

1. The longstanding output for unique constraint violations is like

ERROR:  duplicate key value violates unique constraint "foo_f1_f2_key"
DETAIL:  Key (f1, f2)=(1, 2) already exists.

We could do worse than to use exactly that output (perhaps even
sharing code) and add errcontext like

CONTEXT:  replicated INSERT of row with replica identity (f1, f2)=(1, 2) in table "foo"

We should drop the full-row output for the reason I gave previously,
and I do not see the value of the XID printout either.  This would
also serve (after s/INSERT/UPDATE/) for unique violations during
replicated updates.

2. For no-matching-row in UPDATE or DELETE, perhaps

ERROR:  row to be updated[deleted] does not exist
CONTEXT:  replicated UPDATE[DELETE] of row with replica identity (f1, f2)=(1, 2) in table "foo"

3. I don't understand what delete_origin_differs means or why
it's an error condition, so I won't dare to propose new text
for that.  But the new text should make the reason clear,
and I think the same errcontext still works.

4. We need to make this a separate ereport for each problematic row.
Without having looked at the code, I suspect the reason it works the
way it does now is that the process will exit after ereport(ERROR).
I don't want to completely redesign how ereport works, but maybe
we could change replication apply so that the per-row reports are
emitted as ereport(LOG), and then when we get to a place where we
want to quit, end with

ERROR:  stopping replication because of previously-logged errors

which would make the consequences clearer anyway.

            regards, tom lane

Re: Simplify code building the LR conflict messages

От

Amit Kapila

Дата:

01 декабря, 09:13:16

On Sat, Nov 29, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> After a little bit of thought, here's a sketch of a straw-man idea.
>
> 1. The longstanding output for unique constraint violations is like
>
> ERROR:  duplicate key value violates unique constraint "foo_f1_f2_key"
> DETAIL:  Key (f1, f2)=(1, 2) already exists.
>
> We could do worse than to use exactly that output (perhaps even
> sharing code) and add errcontext like
>
> CONTEXT:  replicated INSERT of row with replica identity (f1, f2)=(1, 2) in table "foo"
>
> We should drop the full-row output for the reason I gave previously,
> and I do not see the value of the XID printout either.
>

The reason for displaying in this style is that, in conflicts, users
may want to define their custom resolution strategies based on
conflict_type. Sometimes they need to resolve conflicts manually as
well. To make an informed decision on which version of the row is
"correct," the human reviewer needs full context.

Example: Imagine a table storing employee data.
Node A updates John Smith’s salary from $100k to $110k.
Node B simultaneously updates John Smith’s job title from "Senior Dev"
to "Lead Dev".
If the conflict report only showed:
Conflict on PK 123. Column 'Salary' differs. Column 'Job Title' differs.

The reviewer doesn't have enough information. However, if the report
shows the full rows:

Node A Version: ID:123, Name: John Smith, Salary: $110k, Title: Senior
Dev, Dept: Engineering, LastUpdatedBy: PayrollSys
Node B Version: ID:123, Name: John Smith, Salary: $100k, Title: Lead
Dev, Dept: Engineering, LastUpdatedBy: HR_Manager

Seeing the full row allows the reviewer to see who made the change and
what else is in the row. They might decide that the HR Manager's
promotion (Title change) should take precedence, or they might realize
they need to merge the two changes manually because both are valid.
Without the full row data (like LastUpdatedBy or Dept), this decision
is impossible.

Another case where we need row data is when a value in Column A might
only be valid based on the value in Column B within the same row. For
example, CHECK Constraints: Imagine a constraint where IF status =
'Active' THEN termination_date MUST BE NULL. If a conflict arises
involving these columns, you need to see both columns simultaneously
to understand why a specific resolution might violate database rules.

We do display the entire row as DETAIL for CHECK constraints even without apply.
CREATE TABLE products (
    product_no integer,
    name text,
    price numeric CHECK (price > 0),
    discounted_price numeric CHECK (discounted_price > 0),
    CHECK (price > discounted_price)
);

postgres=# insert into products values (1, 'pen', 10, 20);
ERROR:  new row for relation "products" violates check constraint
"products_check"
DETAIL:  Failing row contains (1, pen, 10, 20).

Also, as per the docs [1], "The large column values are truncated to
64 bytes." while displaying conflict information which is same as what
we do while displaying the row during CHECK constraint violation.

Additionally, we checked some other systems where they display the
entire row information [2] (See, "The format of a uniqueness conflict
record is").

Currently, we display the information for multiple_unique_conflicts,
when track_commit_timestamp is on as follows:
ERROR:  conflict detected on relation "public.conf_tab":
conflict=multiple_unique_conflicts
DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
        Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
        Key already exists in unique index "conf_tab_b_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
        Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
        Key already exists in unique index "conf_tab_c_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
        Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
CONTEXT:  processing remote data for replication origin "pg_16394"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 759, finished at 0/017A7380

The idea to split it as per your suggestion, and assuming we agree
that additional row details are required for conflict/resolution based
on my above explanation.

LOG:  conflict (multiple_unique_conflicts) detected on relation
"public.conf_tab"
DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
        Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
LOG:  conflict (multiple_unique_conflicts) detected on relation
"public.conf_tab"
DETAIL:  Key already exists in unique index "conf_tab_b_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
        Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
…
ERROR:  stopping replication because of previously-logged errors
CONTEXT:  processing remote data for replication origin "pg_16394"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 759, finished at 0/017A7380

OR

DETAIL:  key (a)=(2) already exists in unique index "conf_tab_pkey",
modified locally in transaction 764 at 2025-12-01
08:59:50.789608+05:30
         existing local row (2, 2, 2)
         remote row (2, 3, 4)

This is based on the below errdetail, we use somewhere else in the code.

Example:
ERROR:  cannot drop table parent because other objects depend on it
DETAIL:  constraint child1_parent_id_fkey on table child1 depends on
table parent
constraint child2_parent_id_fkey on table child2 depends on table parent
view parent_view depends on table parent
view parent_view2 depends on table parent
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Steps:
CREATE TABLE parent ( id SERIAL PRIMARY KEY,    name TEXT);
CREATE TABLE child1 (  id SERIAL PRIMARY KEY,    parent_id INT
REFERENCES parent(id));
CREATE TABLE child2 (  id SERIAL PRIMARY KEY,    parent_id INT
REFERENCES parent(id));
CREATE VIEW parent_view AS SELECT * FROM parent;

CREATE VIEW parent_view2 AS SELECT id, name FROM parent;
CREATE SEQUENCE parent_seq OWNED BY parent.id;
DROP TABLE parent;

>
> 2. For no-matching-row in UPDATE or DELETE, perhaps
>
> ERROR:  row to be updated[deleted] does not exist
> CONTEXT:  replicated UPDATE[DELETE] of row with replica identity (f1, f2)=(1, 2) in table "foo"
>
> 3. I don't understand what delete_origin_differs means or why
> it's an error condition, so I won't dare to propose new text
> for that.  But the new text should make the reason clear,
> and I think the same errcontext still works.
>

You can see the information on delete_origin_differs and other
conflicts in the docs [3]. We can discuss/decide on the message of
other conflict types, once we decide multiple_unique_conflicts.

[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html
[2] - https://docs.oracle.com/database/timesten-18.1/TTREP/conflict.htm#TTREP634
[3] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

--
With Regards,
Amit Kapila.

Re: Simplify code building the LR conflict messages

От

Álvaro Herrera

Дата:

03 декабря, 17:41:09

On 2025-Dec-01, Amit Kapila wrote:

> The reason for displaying in this style is that, in conflicts, users
> may want to define their custom resolution strategies based on
> conflict_type. Sometimes they need to resolve conflicts manually as
> well. To make an informed decision on which version of the row is
> "correct," the human reviewer needs full context.

I think it's odd that conflict resolution depends on log entries.  I
think it would be much more valuable if conflict reporting would save
the details of the conflict to some kind of conflict logging table.
How exactly are we expecting that users would bring the data from the
log file to a database row, when they are to be merged?  What happens if
there are bytea columns in the table?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Simplify code building the LR conflict messages

От

Mihail Nikalayeu

Дата:

03 декабря, 21:04:02

Hello, everyone!

On Wed, Dec 3, 2025 at 3:41 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> I think it's odd that conflict resolution depends on log entries.  I
> think it would be much more valuable if conflict reporting would save
> the details of the conflict to some kind of conflict logging table.
> How exactly are we expecting that users would bring the data from the
> log file to a database row, when they are to be merged?  What happens if
> there are bytea columns in the table?

Such a proposal exists already at [0].

Also, sorry for being noisy and a little-bit offtopic but I think it
is a good moment to highlight the fact that such messages are also not
so useful because sometimes they are just clearly incorrect...
You may find details and reproducers at [1] and [2].

Best regards,
Mikhail.

[0]:
https://www.postgresql.org/message-id/flat/CAA4eK1%2B44b3vd_OWfiaVNtjf5Njb5cek09pmKRmttBByeg0NoA%40mail.gmail.com#e74b1b89039c21e64342a7d337128edb
[1]: https://www.postgresql.org/message-id/flat/CADzfLwXZVmbo11tFS_G2i+6TfFVwHU4VUUSeoqb+8UQfuoJs8A@mail.gmail.com
[2]:
https://www.postgresql.org/message-id/flat/CADzfLwVj9ogzrf2P_H9Xb1z8vLEzBemBxp1nC9wCg4KaOFbvmw%40mail.gmail.com#41375b3243460a14bfab55bfead9011e

Re: Simplify code building the LR conflict messages

От

Amit Kapila

Дата:

04 декабря, 09:27:21

On Wed, Dec 3, 2025 at 8:11 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Dec-01, Amit Kapila wrote:
>
> > The reason for displaying in this style is that, in conflicts, users
> > may want to define their custom resolution strategies based on
> > conflict_type. Sometimes they need to resolve conflicts manually as
> > well. To make an informed decision on which version of the row is
> > "correct," the human reviewer needs full context.
>
> I think it's odd that conflict resolution depends on log entries.  I
> think it would be much more valuable if conflict reporting would save
> the details of the conflict to some kind of conflict logging table.
>

Yes, that is more valuable, and as mentioned by Mihail, we are already
discussing the same. However, that has its own challenges, like it
needs some form of manual management by the user to clean up the
conflict_history table. Also, one can say that because table insert is
a heavy operation and more prone to failures, it is better to LOG it.
Yes, the user would require some form of script to extract the
information from the LOG, so both methods have their pros and cons. I
think we will need a configurable knob to let the user decide which
way they prefer to log the conflict information.

--
With Regards,
Amit Kapila.

Re: Simplify code building the LR conflict messages

От

Amit Kapila

Дата:

04 декабря, 09:28:47

On Wed, Dec 3, 2025 at 11:34 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
>
> Hello, everyone!
>
> On Wed, Dec 3, 2025 at 3:41 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > I think it's odd that conflict resolution depends on log entries.  I
> > think it would be much more valuable if conflict reporting would save
> > the details of the conflict to some kind of conflict logging table.
> > How exactly are we expecting that users would bring the data from the
> > log file to a database row, when they are to be merged?  What happens if
> > there are bytea columns in the table?
>
> Such a proposal exists already at [0].
>

Thanks for providing the information.

> Also, sorry for being noisy and a little-bit offtopic but I think it
> is a good moment to highlight the fact that such messages are also not
> so useful because sometimes they are just clearly incorrect...
> You may find details and reproducers at [1] and [2].
>

Yes, I am aware of those, but IIRC, those are due to the traditional
design choice of using DirtySnapshot during the apply-phase of logical
replication. We need to discuss that separately and come to a
conclusion.

--
With Regards,
Amit Kapila.

Re: Simplify code building the LR conflict messages

От

Masahiko Sawada

Дата:

25 декабря, 00:09:54

On Sun, Nov 30, 2025 at 10:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 29, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > After a little bit of thought, here's a sketch of a straw-man idea.
> >
> > 1. The longstanding output for unique constraint violations is like
> >
> > ERROR:  duplicate key value violates unique constraint "foo_f1_f2_key"
> > DETAIL:  Key (f1, f2)=(1, 2) already exists.
> >
> > We could do worse than to use exactly that output (perhaps even
> > sharing code) and add errcontext like
> >
> > CONTEXT:  replicated INSERT of row with replica identity (f1, f2)=(1, 2) in table "foo"
> >
> > We should drop the full-row output for the reason I gave previously,
> > and I do not see the value of the XID printout either.
> >
>
> The reason for displaying in this style is that, in conflicts, users
> may want to define their custom resolution strategies based on
> conflict_type. Sometimes they need to resolve conflicts manually as
> well. To make an informed decision on which version of the row is
> "correct," the human reviewer needs full context.
>
> Example: Imagine a table storing employee data.
> Node A updates John Smith’s salary from $100k to $110k.
> Node B simultaneously updates John Smith’s job title from "Senior Dev"
> to "Lead Dev".
> If the conflict report only showed:
> Conflict on PK 123. Column 'Salary' differs. Column 'Job Title' differs.
>
> The reviewer doesn't have enough information. However, if the report
> shows the full rows:
>
> Node A Version: ID:123, Name: John Smith, Salary: $110k, Title: Senior
> Dev, Dept: Engineering, LastUpdatedBy: PayrollSys
> Node B Version: ID:123, Name: John Smith, Salary: $100k, Title: Lead
> Dev, Dept: Engineering, LastUpdatedBy: HR_Manager
>
> Seeing the full row allows the reviewer to see who made the change and
> what else is in the row. They might decide that the HR Manager's
> promotion (Title change) should take precedence, or they might realize
> they need to merge the two changes manually because both are valid.
> Without the full row data (like LastUpdatedBy or Dept), this decision
> is impossible.
>
> Another case where we need row data is when a value in Column A might
> only be valid based on the value in Column B within the same row. For
> example, CHECK Constraints: Imagine a constraint where IF status =
> 'Active' THEN termination_date MUST BE NULL. If a conflict arises
> involving these columns, you need to see both columns simultaneously
> to understand why a specific resolution might violate database rules.
>
> We do display the entire row as DETAIL for CHECK constraints even without apply.
> CREATE TABLE products (
>     product_no integer,
>     name text,
>     price numeric CHECK (price > 0),
>     discounted_price numeric CHECK (discounted_price > 0),
>     CHECK (price > discounted_price)
> );
>
> postgres=# insert into products values (1, 'pen', 10, 20);
> ERROR:  new row for relation "products" violates check constraint
> "products_check"
> DETAIL:  Failing row contains (1, pen, 10, 20).
>
> Also, as per the docs [1], "The large column values are truncated to
> 64 bytes." while displaying conflict information which is same as what
> we do while displaying the row during CHECK constraint violation.
>
> Additionally, we checked some other systems where they display the
> entire row information [2] (See, "The format of a uniqueness conflict
> record is").

I find there is a value in showing the whole before and after row
image for better investigation. Users might want to write the row
images to a separate file to avoid the server logs from being bloated
and 64 bytes limit might not be enough in practice. It might also be a
good idea to show only the different column data of remote and local
rows. There is room for improvements and the conflict history table
patch would be a good start for that anyway.

Regarding the proposed message style:

> The idea to split it as per your suggestion, and assuming we agree
> that additional row details are required for conflict/resolution based
> on my above explanation.
>
> LOG:  conflict (multiple_unique_conflicts) detected on relation
> "public.conf_tab"
> DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
>         Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> LOG:  conflict (multiple_unique_conflicts) detected on relation
> "public.conf_tab"
> DETAIL:  Key already exists in unique index "conf_tab_b_key", modified
> locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
>         Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> …
> ERROR:  stopping replication because of previously-logged errors
> CONTEXT:  processing remote data for replication origin "pg_16394"
> during message type "INSERT" for replication target relation
> "public.conf_tab" in transaction 759, finished at 0/017A7380
>

The DETAIL message still seems to violate our message style guideline.
For instance, "Key (a)=(2); existing local row (2, 2, 2); remote row
(2, 3, 4)." is not a complete sentence.

I personally find that the current approach (i.e., having multiple
detail lines in DETAIL) is more understandable since it's clear for
users why the apply worker exited. With the proposed approach, after
checking the ERROR log, the user would need to collect all relevant
(potentially straggled) LOG messages from the server logs.

If we report multiple_unique_conflicts conflicts by writing a
combination of multiple LOGs and one ERROR, probably we might want to
approach this style also for other conflict types for better
consistency.

The conflict report for multiple_unique_conflicts looks redundant to
me as it shows the whole remote row image multiple times even though
it should be the same.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Simplify code building the LR conflict messages

Вложения

Вложения