Обсуждение: Simplify code building the LR conflict messages
While reviewing another conflict-related thread, I noticed that the
existing conflict messages are currently getting built using
unexpected logic.
e.g
----------
if (tuple_value.len > 0)
{
appendStringInfoString(&tuple_value, "; ");
appendStringInfo(&tuple_value, _("existing local row %s"),
desc);
}
else
{
appendStringInfo(&tuple_value, _("Existing local row %s"),
desc);
}
----------
I couldn't think of a reason why the "; " string needed to be
separated from the rest of the message like that. And when you combine
the strings, the logic easily collapses into a single statement with
less code and greater readability.
There were several code fragments like this. Here is a patch to modify them.
Thoughts?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Вложения
Peter Smith <smithpb2250@gmail.com> writes:
> I couldn't think of a reason why the "; " string needed to be
> separated from the rest of the message like that. And when you combine
> the strings, the logic easily collapses into a single statement with
> less code and greater readability.
... and, probably, less ability of the compiler to verify that the
variadic arguments match the format string. I think you've taken
this a bit too far.
regards, tom lane
On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Smith <smithpb2250@gmail.com> writes: > > I couldn't think of a reason why the "; " string needed to be > > separated from the rest of the message like that. And when you combine > > the strings, the logic easily collapses into a single statement with > > less code and greater readability. > > ... and, probably, less ability of the compiler to verify that the > variadic arguments match the format string. I think you've taken > this a bit too far. > Hi Tom, Thank you for the feedback. Could you please clarify which aspect is of concern so I can address it properly: * Is the concern about having a StringInfo format string that starts with a semicolon? If so, I noticed there are already multiple similar examples in the codebase (see: `grep -r . -e 'StringInfo(.*";.*%'`), so I understood this pattern to be acceptable. * Or is it the use of the ternary operator to select the format string? If that's the issue, please note that my patch only introduced the ternary operator for the first two code fragments. The third fragment already uses ternaries in the same way on master, so I understood that to be an established pattern as well. I'd like to make sure I understand your concern correctly so I can revise the patch appropriately. ====== Kind Regards, Peter Smith Fujitsu Australia
Peter Smith <smithpb2250@gmail.com> writes:
> On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... and, probably, less ability of the compiler to verify that the
>> variadic arguments match the format string. I think you've taken
>> this a bit too far.
> * Or is it the use of the ternary operator to select the format
> string? If that's the issue, please note that my patch only introduced
> the ternary operator for the first two code fragments. The third
> fragment already uses ternaries in the same way on master, so I
> understood that to be an established pattern as well.
Sadly, you were copying bad code that we need to fix.
> I'd like to make sure I understand your concern correctly so I can
> revise the patch appropriately.
My concern is that we follow a coding style that the compiler can
check. Most C compilers can only verify that printf-like format
strings match the actual arguments if the format string is a constant.
So if I fat-finger the format string in this example:
--- a/src/backend/replication/logical/conflict.c
+++ b/src/backend/replication/logical/conflict.c
@@ -383,7 +383,7 @@ build_tuple_value_details(EState *estate, ResultRelInfo *relinfo,
if (tuple_value.len > 0)
{
appendStringInfoString(&tuple_value, "; ");
- appendStringInfo(&tuple_value, _("existing local row %s"),
+ appendStringInfo(&tuple_value, _("existing local row %d"),
desc);
}
I'll hear about it:
conflict.c: In function 'build_tuple_value_details':
conflict.c:386:38: warning: format '%d' expects argument of type 'int', but argument 3 has type 'char *' [-Wformat=]
appendStringInfo(&tuple_value, _("existing local row %d"),
^~~~~~~~~~~~~~~~~~~~~~~
conflict.c:386:36: note: in expansion of macro '_'
appendStringInfo(&tuple_value, _("existing local row %d"),
^
But I don't trust the compiler to see through a ternary expression
and check (both!) format strings against the actuals.
In a quick test, the gcc version I have handy does seem to be able to
do that, but I don't think we should rely on that. Format strings
are something that is way too easy to get wrong, and most of us just
expect the compiler to cross-check them for us, so coding patterns
that might silently disable such compiler warnings are best avoided.
(There's also some magic going on here to allow the compiler to see
through gettext(), but it's quite magic. We don't need to assume
multiple layers of magic will work reliably.)
regards, tom lane
On Fri, Nov 28, 2025 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Smith <smithpb2250@gmail.com> writes:
> > On Fri, Nov 28, 2025 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ... and, probably, less ability of the compiler to verify that the
> >> variadic arguments match the format string. I think you've taken
> >> this a bit too far.
>
> > * Or is it the use of the ternary operator to select the format
> > string? If that's the issue, please note that my patch only introduced
> > the ternary operator for the first two code fragments. The third
> > fragment already uses ternaries in the same way on master, so I
> > understood that to be an established pattern as well.
>
> Sadly, you were copying bad code that we need to fix.
>
> > I'd like to make sure I understand your concern correctly so I can
> > revise the patch appropriately.
>
> My concern is that we follow a coding style that the compiler can
> check. Most C compilers can only verify that printf-like format
> strings match the actual arguments if the format string is a constant.
> So if I fat-finger the format string in this example:
>
> --- a/src/backend/replication/logical/conflict.c
> +++ b/src/backend/replication/logical/conflict.c
> @@ -383,7 +383,7 @@ build_tuple_value_details(EState *estate, ResultRelInfo *relinfo,
> if (tuple_value.len > 0)
> {
> appendStringInfoString(&tuple_value, "; ");
> - appendStringInfo(&tuple_value, _("existing local row %s"),
> + appendStringInfo(&tuple_value, _("existing local row %d"),
> desc);
> }
>
> I'll hear about it:
>
> conflict.c: In function 'build_tuple_value_details':
> conflict.c:386:38: warning: format '%d' expects argument of type 'int', but argument 3 has type 'char *' [-Wformat=]
> appendStringInfo(&tuple_value, _("existing local row %d"),
> ^~~~~~~~~~~~~~~~~~~~~~~
> conflict.c:386:36: note: in expansion of macro '_'
> appendStringInfo(&tuple_value, _("existing local row %d"),
> ^
>
> But I don't trust the compiler to see through a ternary expression
> and check (both!) format strings against the actuals.
>
> In a quick test, the gcc version I have handy does seem to be able to
> do that, but I don't think we should rely on that. Format strings
> are something that is way too easy to get wrong, and most of us just
> expect the compiler to cross-check them for us, so coding patterns
> that might silently disable such compiler warnings are best avoided.
>
> (There's also some magic going on here to allow the compiler to see
> through gettext(), but it's quite magic. We don't need to assume
> multiple layers of magic will work reliably.)
>
Got it. Thanks for the detailed reason.
Here is a revised patch v2 that removes all ternaries from the format
string decision making (including the ones that were already in
master).
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Вложения
Hello,
So, what we're doing here is to append further row-identifying details
to an errdetail string that already contains some explanation of the
problem. That is, we have something like
DETAIL: The row to be updated was deleted.
and then we add whatever this function produces, after a newline. I
don't remember any other place in our code that does such a thing. The
result would look something like
DETAIL: The row to be updated was deleted.
Key: (1, 2); remote row (1,2,3); replica identity full (1,2,3).
or something like that, where each of the parts can begin with uppercase
or not, with a semicolon or not, depending on whether there are any
previous parts.
I wonder if it would make more sense to move this identifying
information to a CONTEXT line instead, where we won't have to care about
the uppercase.
We still have to worry about the semicolon though. Maybe we should make
that explicit in the string but given control to the translator by doing
something like
/* translator: first %s is either empty or a translated list
separator, second %s is a row descriptor */
appendStringInfo(&buf, "%sremote row %s", empty ? "" :
/* translator: this is used as a list separator */
_("; "), remote_row_description);
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
algunas personas nos parecen brillantes un minuto antes
de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)
On Fri, Nov 28, 2025 at 6:38 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > So, what we're doing here is to append further row-identifying details > to an errdetail string that already contains some explanation of the > problem. That is, we have something like > > DETAIL: The row to be updated was deleted. > > and then we add whatever this function produces, after a newline. I > don't remember any other place in our code that does such a thing. The > result would look something like > > DETAIL: The row to be updated was deleted. > Key: (1, 2); remote row (1,2,3); replica identity full (1,2,3). > > or something like that, where each of the parts can begin with uppercase > or not, with a semicolon or not, depending on whether there are any > previous parts. > > I wonder if it would make more sense to move this identifying > information to a CONTEXT line instead, where we won't have to care about > the uppercase. > As per my understanding, we typically display the information in context via errcontext() in error context callback functions which may not be immediately available at the location where error happens. For example, while applying the remote changes, the error can happen anywhere in the heap or below layer, but we add additional apply related information (like which apply_message was being processed when error occurred) via apply_error_callback. So, it is not clear in the case at hand whether we should display the information which is available at error_site via context message. -- With Regards, Amit Kapila.
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> So, what we're doing here is to append further row-identifying details
> to an errdetail string that already contains some explanation of the
> problem. That is, we have something like
> DETAIL: The row to be updated was deleted.
> and then we add whatever this function produces, after a newline. I
> don't remember any other place in our code that does such a thing. The
> result would look something like
> DETAIL: The row to be updated was deleted.
> Key: (1, 2); remote row (1,2,3); replica identity full (1,2,3).
> or something like that, where each of the parts can begin with uppercase
> or not, with a semicolon or not, depending on whether there are any
> previous parts.
I looked into the postmaster log for subscription/t/035_conflicts.pl
to find actual examples of what this code produces, and now I think
it's got worse problems than whether the C code style is good.
The output is frankly a mess, and IMO it violates our message style
guidelines in multiple ways. Here's a sampling:
2025-11-29 11:43:03.782 EST logical replication apply worker[659868] ERROR: conflict detected on relation
"public.conf_tab":conflict=multiple_unique_conflicts
2025-11-29 11:43:03.782 EST logical replication apply worker[659868] DETAIL: Key already exists in unique index
"conf_tab_pkey",modified in transaction 770.
Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_b_key", modified in transaction 770.
Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_c_key", modified in transaction 770.
Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
2025-11-29 11:43:04.654 EST logical replication apply worker[659907] LOG: conflict detected on relation
"public.conf_tab":conflict=update_missing
2025-11-29 11:43:04.654 EST logical replication apply worker[659907] DETAIL: Could not find the row to be updated.
Remote row (6, 7, 8); replica identity (a)=(5).
2025-11-29 11:43:05.266 EST logical replication apply worker[659942] LOG: conflict detected on relation "public.tab":
conflict=delete_origin_differs
2025-11-29 11:43:05.266 EST logical replication apply worker[659942] DETAIL: Deleting the row that was modified
locallyin transaction 790 at 2025-11-29 11:43:05.262807-05.
Existing local row (1, 3); replica identity (a)=(1).
Where to begin?
* "conflict detected" is already presuming way too much context for
someone who's looking into the postmaster log. Here we can guess
what it's about because somebody configured this test with %b in
log_line_prefix. But without the "logical replication apply worker"
cue, are users really going to know what these messages are about?
* The business about "conflict=some_kind_of_keyword" is certainly not
per our style guidelines, and I wonder how it translates. (Checks
code ... looks like we don't even try.)
* The DETAIL messages are not complete sentences or even
approximations to them, and frankly they are just about unintelligible.
* I follow "Key already exists in unique index "foo" but not "modified
in transaction 770" --- what was modified, which side's XID is this,
and why do we think an XID will be of any help whatever? The run-on
clause is very poor English in any case. "Deleting the row that was
modified locally in transaction 790 at 2025-11-29 11:43:05.262807-05."
is better in that a transaction timestamp is much more likely to be
useful than an XID, and it's marginally better English, but it's still
not a complete sentence and it reads like context not detail.
* What am I to make of this:
"Key (a)=(55); existing local row (55, 2, 3); remote row (55, 2, 3)."
From context I guess it's trying to show me the entire rows not just
the duplicated key value, but they're not labeled adequately.
Besides, I wonder how well this output scales to non-toy tables where
the rows are kilobytes or megabytes long. Index keys can be safely
assumed to not be enormously long, but it doesn't follow that it's
sane to emit the entire row. Much less several of them in one
error message.
* The three examples I show above look like the detail messages were
written by three different people who were not talking to each other.
The initial kind-of-a-sentence part is totally different style in
each case. And why do some of the outputs put the labeled key first
and others put it last?
So I think this area desperately needs significant editorial
attention, as well as some fundamental rethinking of just what
information we should show. Perhaps using errcontext would help,
but I'm not sure. I think a large part of the problem stems from
trying to cram multiple error conditions into one ereport ... is it
possible to avoid that?
regards, tom lane
I wrote:
> So I think this area desperately needs significant editorial
> attention, as well as some fundamental rethinking of just what
> information we should show. Perhaps using errcontext would help,
> but I'm not sure. I think a large part of the problem stems from
> trying to cram multiple error conditions into one ereport ... is it
> possible to avoid that?
After a little bit of thought, here's a sketch of a straw-man idea.
1. The longstanding output for unique constraint violations is like
ERROR: duplicate key value violates unique constraint "foo_f1_f2_key"
DETAIL: Key (f1, f2)=(1, 2) already exists.
We could do worse than to use exactly that output (perhaps even
sharing code) and add errcontext like
CONTEXT: replicated INSERT of row with replica identity (f1, f2)=(1, 2) in table "foo"
We should drop the full-row output for the reason I gave previously,
and I do not see the value of the XID printout either. This would
also serve (after s/INSERT/UPDATE/) for unique violations during
replicated updates.
2. For no-matching-row in UPDATE or DELETE, perhaps
ERROR: row to be updated[deleted] does not exist
CONTEXT: replicated UPDATE[DELETE] of row with replica identity (f1, f2)=(1, 2) in table "foo"
3. I don't understand what delete_origin_differs means or why
it's an error condition, so I won't dare to propose new text
for that. But the new text should make the reason clear,
and I think the same errcontext still works.
4. We need to make this a separate ereport for each problematic row.
Without having looked at the code, I suspect the reason it works the
way it does now is that the process will exit after ereport(ERROR).
I don't want to completely redesign how ereport works, but maybe
we could change replication apply so that the per-row reports are
emitted as ereport(LOG), and then when we get to a place where we
want to quit, end with
ERROR: stopping replication because of previously-logged errors
which would make the consequences clearer anyway.
regards, tom lane
On Sat, Nov 29, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> After a little bit of thought, here's a sketch of a straw-man idea.
>
> 1. The longstanding output for unique constraint violations is like
>
> ERROR: duplicate key value violates unique constraint "foo_f1_f2_key"
> DETAIL: Key (f1, f2)=(1, 2) already exists.
>
> We could do worse than to use exactly that output (perhaps even
> sharing code) and add errcontext like
>
> CONTEXT: replicated INSERT of row with replica identity (f1, f2)=(1, 2) in table "foo"
>
> We should drop the full-row output for the reason I gave previously,
> and I do not see the value of the XID printout either.
>
The reason for displaying in this style is that, in conflicts, users
may want to define their custom resolution strategies based on
conflict_type. Sometimes they need to resolve conflicts manually as
well. To make an informed decision on which version of the row is
"correct," the human reviewer needs full context.
Example: Imagine a table storing employee data.
Node A updates John Smith’s salary from $100k to $110k.
Node B simultaneously updates John Smith’s job title from "Senior Dev"
to "Lead Dev".
If the conflict report only showed:
Conflict on PK 123. Column 'Salary' differs. Column 'Job Title' differs.
The reviewer doesn't have enough information. However, if the report
shows the full rows:
Node A Version: ID:123, Name: John Smith, Salary: $110k, Title: Senior
Dev, Dept: Engineering, LastUpdatedBy: PayrollSys
Node B Version: ID:123, Name: John Smith, Salary: $100k, Title: Lead
Dev, Dept: Engineering, LastUpdatedBy: HR_Manager
Seeing the full row allows the reviewer to see who made the change and
what else is in the row. They might decide that the HR Manager's
promotion (Title change) should take precedence, or they might realize
they need to merge the two changes manually because both are valid.
Without the full row data (like LastUpdatedBy or Dept), this decision
is impossible.
Another case where we need row data is when a value in Column A might
only be valid based on the value in Column B within the same row. For
example, CHECK Constraints: Imagine a constraint where IF status =
'Active' THEN termination_date MUST BE NULL. If a conflict arises
involving these columns, you need to see both columns simultaneously
to understand why a specific resolution might violate database rules.
We do display the entire row as DETAIL for CHECK constraints even without apply.
CREATE TABLE products (
product_no integer,
name text,
price numeric CHECK (price > 0),
discounted_price numeric CHECK (discounted_price > 0),
CHECK (price > discounted_price)
);
postgres=# insert into products values (1, 'pen', 10, 20);
ERROR: new row for relation "products" violates check constraint
"products_check"
DETAIL: Failing row contains (1, pen, 10, 20).
Also, as per the docs [1], "The large column values are truncated to
64 bytes." while displaying conflict information which is same as what
we do while displaying the row during CHECK constraint violation.
Additionally, we checked some other systems where they display the
entire row information [2] (See, "The format of a uniqueness conflict
record is").
Currently, we display the information for multiple_unique_conflicts,
when track_commit_timestamp is on as follows:
ERROR: conflict detected on relation "public.conf_tab":
conflict=multiple_unique_conflicts
DETAIL: Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_b_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
Key already exists in unique index "conf_tab_c_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
CONTEXT: processing remote data for replication origin "pg_16394"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 759, finished at 0/017A7380
The idea to split it as per your suggestion, and assuming we agree
that additional row details are required for conflict/resolution based
on my above explanation.
LOG: conflict (multiple_unique_conflicts) detected on relation
"public.conf_tab"
DETAIL: Key already exists in unique index "conf_tab_pkey", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
LOG: conflict (multiple_unique_conflicts) detected on relation
"public.conf_tab"
DETAIL: Key already exists in unique index "conf_tab_b_key", modified
locally in transaction 764 at 2025-12-01 08:59:50.789608+05:30.
Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
…
ERROR: stopping replication because of previously-logged errors
CONTEXT: processing remote data for replication origin "pg_16394"
during message type "INSERT" for replication target relation
"public.conf_tab" in transaction 759, finished at 0/017A7380
OR
DETAIL: key (a)=(2) already exists in unique index "conf_tab_pkey",
modified locally in transaction 764 at 2025-12-01
08:59:50.789608+05:30
existing local row (2, 2, 2)
remote row (2, 3, 4)
This is based on the below errdetail, we use somewhere else in the code.
Example:
ERROR: cannot drop table parent because other objects depend on it
DETAIL: constraint child1_parent_id_fkey on table child1 depends on
table parent
constraint child2_parent_id_fkey on table child2 depends on table parent
view parent_view depends on table parent
view parent_view2 depends on table parent
HINT: Use DROP ... CASCADE to drop the dependent objects too.
Steps:
CREATE TABLE parent ( id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE child1 ( id SERIAL PRIMARY KEY, parent_id INT
REFERENCES parent(id));
CREATE TABLE child2 ( id SERIAL PRIMARY KEY, parent_id INT
REFERENCES parent(id));
CREATE VIEW parent_view AS SELECT * FROM parent;
CREATE VIEW parent_view2 AS SELECT id, name FROM parent;
CREATE SEQUENCE parent_seq OWNED BY parent.id;
DROP TABLE parent;
>
> 2. For no-matching-row in UPDATE or DELETE, perhaps
>
> ERROR: row to be updated[deleted] does not exist
> CONTEXT: replicated UPDATE[DELETE] of row with replica identity (f1, f2)=(1, 2) in table "foo"
>
> 3. I don't understand what delete_origin_differs means or why
> it's an error condition, so I won't dare to propose new text
> for that. But the new text should make the reason clear,
> and I think the same errcontext still works.
>
You can see the information on delete_origin_differs and other
conflicts in the docs [3]. We can discuss/decide on the message of
other conflict types, once we decide multiple_unique_conflicts.
[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html
[2] - https://docs.oracle.com/database/timesten-18.1/TTREP/conflict.htm#TTREP634
[3] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html
--
With Regards,
Amit Kapila.
On 2025-Dec-01, Amit Kapila wrote: > The reason for displaying in this style is that, in conflicts, users > may want to define their custom resolution strategies based on > conflict_type. Sometimes they need to resolve conflicts manually as > well. To make an informed decision on which version of the row is > "correct," the human reviewer needs full context. I think it's odd that conflict resolution depends on log entries. I think it would be much more valuable if conflict reporting would save the details of the conflict to some kind of conflict logging table. How exactly are we expecting that users would bring the data from the log file to a database row, when they are to be merged? What happens if there are bytea columns in the table? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hello, everyone! On Wed, Dec 3, 2025 at 3:41 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I think it's odd that conflict resolution depends on log entries. I > think it would be much more valuable if conflict reporting would save > the details of the conflict to some kind of conflict logging table. > How exactly are we expecting that users would bring the data from the > log file to a database row, when they are to be merged? What happens if > there are bytea columns in the table? Such a proposal exists already at [0]. Also, sorry for being noisy and a little-bit offtopic but I think it is a good moment to highlight the fact that such messages are also not so useful because sometimes they are just clearly incorrect... You may find details and reproducers at [1] and [2]. Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CAA4eK1%2B44b3vd_OWfiaVNtjf5Njb5cek09pmKRmttBByeg0NoA%40mail.gmail.com#e74b1b89039c21e64342a7d337128edb [1]: https://www.postgresql.org/message-id/flat/CADzfLwXZVmbo11tFS_G2i+6TfFVwHU4VUUSeoqb+8UQfuoJs8A@mail.gmail.com [2]: https://www.postgresql.org/message-id/flat/CADzfLwVj9ogzrf2P_H9Xb1z8vLEzBemBxp1nC9wCg4KaOFbvmw%40mail.gmail.com#41375b3243460a14bfab55bfead9011e