Re: Logical replication is missing block of rows when sending initial sync?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Logical replication is missing block of rows when sending initial sync?
Дата
Msg-id 2a638962-7a62-abd6-d9a4-3c322ec2a06d@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
Ответы Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
Список pgsql-bugs

On 11/8/23 15:02, hubert depesz lubaczewski wrote:
>> OK. found another case (maybe two).
> 
> OK, the other case is real. Different database. Same scenario.
> 
> Errors (3 because we have 3 missing fkeys):
> 
> #v+
> ALTER TABLE ONLY c44s16539.assignment_overrides ADD CONSTRAINT fk_rails_58f8ee369b FOREIGN KEY (assignment_id)
REFERENCESc44s16539.assignments(id);
 
> ERROR:  insert or update on table "assignment_overrides" violates foreign key constraint "fk_rails_58f8ee369b"
> DETAIL:  Key (assignment_id)=(248083) is not present in table "assignments".
> 
> ALTER TABLE ONLY c44s16539.post_policies ADD CONSTRAINT fk_rails_cf2d119863 FOREIGN KEY (assignment_id) REFERENCES
c44s16539.assignments(id);
> ERROR:  insert or update on table "post_policies" violates foreign key constraint "fk_rails_cf2d119863"
> DETAIL:  Key (assignment_id)=(248083) is not present in table "assignments".
> 
> ALTER TABLE ONLY c44s16539.submissions ADD CONSTRAINT fk_rails_61cac0823d FOREIGN KEY (assignment_id) REFERENCES
c44s16539.assignments(id);
> ERROR:  insert or update on table "submissions" violates foreign key constraint "fk_rails_61cac0823d"
> DETAIL:  Key (assignment_id)=(248083) is not present in table "assignments".
> #v-
> 
> In all cases the problem is missing c44s16539.assignments with is = 248083.
> 
> This time it looks that only one row is missing:
> 
> #v+
> Pg14# select max(id) from c44s16539.assignments where id < 248083;
>   max
> ────────
>  248082
> (1 row)
> 
> Pg14# select min(id) from c44s16539.assignments where id > 248083;
>   min
> ────────
>  248084
> (1 row)
> #v-
> 
> Schema is complicated. 76 columns. On pg12 side one column is dropped.
> 22 indexes, 9 fkeys from this table (assignments) to others, ~ 35 fkeys
> pointing to this table (but only 3 have rows that reference the
> missing record.
> 
> Some informationa bout the record:
> 
> #v+
> Pg12# select ctid, xmin, cmin, xmax, cmax, id, created_at, updated_at from c44s16539.assignments where id = 248083
\gx
> ─[ RECORD 1 ]──────────────────────────
> ctid       │ (24192,1)
> xmin       │ 472291499
> cmin       │ 12
> xmax       │ 472291499
> cmax       │ 12
> id         │ 248083
> created_at │ 2023-11-07 15:50:05.506077
> updated_at │ 2023-11-07 15:51:28.738893
> #v-
> 
> logs for when it was added to replication:
> 
> #v+
> 2023-11-07 15:47:27.898 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,39,"ALTER PUBLICATION",2023-11-07
15:47:27UTC,6/0,0,LOG,00000,"duration: 1.476 ms",,,,,,,,,"psql"
 
> 2023-11-07 15:47:27.899 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,40,"idle",2023-11-07 15:47:27
UTC,6/1209097693,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c247s1094.messages_2022_50;",,,,,,,,,"psql"
> 2023-11-07 15:47:27.900 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,41,"ALTER PUBLICATION",2023-11-07
15:47:27UTC,6/0,0,LOG,00000,"duration: 1.493 ms",,,,,,,,,"psql"
 
> 2023-11-07 15:47:27.900 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,42,"idle",2023-11-07 15:47:27
UTC,6/1209097694,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c44s16539.assignments;",,,,,,,,,"psql"
> 2023-11-07 15:47:27.902 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,43,"ALTER PUBLICATION",2023-11-07
15:47:27UTC,6/0,0,LOG,00000,"duration: 1.936 ms",,,,,,,,,"psql"
 
> 2023-11-07 15:47:27.902 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,44,"idle",2023-11-07 15:47:27
UTC,6/1209097695,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c39s5131.messages_2023_20;",,,,,,,,,"psql"
> 2023-11-07 15:47:27.903 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,45,"ALTER PUBLICATION",2023-11-07
15:47:27UTC,6/0,0,LOG,00000,"duration: 1.292 ms",,,,,,,,,"psql"
 
> #v-
> 
> It is interesting that the time when the row was inserted (created_at) is
> always "around" the time that the table was added to publication...
> 
> Just like with the other case, I can leave it in current state for
> a day, so if you'd need more information from it, let me know.
> 

I think it'd be interesting to look at the WAL using pg_waldump, and see
how it relates to the LSN used for the tablesync slot.

I'll try reproducing this locally over the weekend. Should I use the
test_1030.sh script that you shared a week ago, or do I need to do
something more?

One thing that puzzles me is that IIUC the script doesn't generate any
changes while adding the tables to the subscription? I probably should
have paid more attention to the details, instead of assuming there are
concurrent changes - which seemed like there might be some race, or
something. But this seems like there are no changes and the tablesync
just fails to copy some subset of it. That's *really* strange.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18188: wrong varnullingrels