Re: Logical replication is missing block of rows when sending initial sync?

Поиск
Список
Период
Сортировка
От hubert depesz lubaczewski
Тема Re: Logical replication is missing block of rows when sending initial sync?
Дата
Msg-id ZUt9FOueePOl2ZUp@depesz.com
обсуждение исходный текст
Ответ на Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
Ответы Re: Logical replication is missing block of rows when sending initial sync?  (hubert depesz lubaczewski <depesz@depesz.com>)
Список pgsql-bugs
On Fri, Nov 03, 2023 at 01:22:42PM +0100, hubert depesz lubaczewski wrote:
> On Fri, Nov 03, 2023 at 01:20:22PM +0100, Tomas Vondra wrote:
> > I think it'd be interesting to know:
> > 
> > 1) Commit LSN for the missing rows (for the xmin).
> > 
> > 2) Are the other changes for these transactions that *got* replicated
> > correctly?
> 
> How would I know?
> 
> > 3) LSNs used for the tablesync slot, catchup etc. I believe those are in
> > the server log.
> 
> I will provide more logs on the next problem.

OK. found another case (maybe two).

First, error:
ALTER TABLE ONLY c203s15531.conversation_message_participants ADD CONSTRAINT fk_rails_992a8b9e13 FOREIGN KEY
(conversation_message_id)REFERENCES c203s15531.conversation_messages(id);
 
ERROR:  insert or update on table "conversation_message_participants" violates foreign key constraint
"fk_rails_992a8b9e13"
DETAIL:  Key (conversation_message_id)=(431568) is not present in table "conversation_messages".

With forced seq scans (set enable_indexscan = false; set enable_bitmapscan = false; set enable_indexonlyscan = false;)

I got:

(12/bionic):

#v+
=# select count(*) from c203s15531.conversation_messages where id = 431568;
 count
───────
     1
(1 row)

=# select count(*) from c203s15531.conversation_message_participants where conversation_message_id=431568;
 count
───────
     2
(1 row)
#v-

(14/focal):

#v+
=# select count(*) from c203s15531.conversation_messages where id = 431568;
 count
───────
     0
(1 row)

=# select count(*) from c203s15531.conversation_message_participants where conversation_message_id=431568;
 count
───────
     2
(1 row)
#v-

So it looks that some rows are missing. Let's see the missing range:

#v+
Pg14# select max(id) from conversation_messages where id < 431568;
  max
────────
 431567
(1 row)

Pg14# select min(id) from conversation_messages where id > 431568;
  min
────────
 431572
(1 row)
#v-

So, the block from 431568 to 431571 is missing. On pg12, though:

#v+
Pg12# select count(*) from conversation_messages where id between 431568 and 431571;
 count
───────
     4
(1 row)
Pg12# select ctid, xmin, cmin, xmax, cmax, id, created_at from conversation_messages where id between 431568 and
431571;
    ctid    │    xmin    │ cmin │    xmax    │ cmax │   id   │         created_at
────────────┼────────────┼──────┼────────────┼──────┼────────┼────────────────────────────
 (29604,10) │ 3095841913 │    1 │ 3095841995 │    1 │ 431568 │ 2023-11-07 15:12:38.712504
 (29604,11) │ 3095848690 │    2 │ 3095848745 │    2 │ 431569 │ 2023-11-07 15:12:49.5695
 (29604,12) │ 3095884949 │    2 │ 3095885033 │    2 │ 431570 │ 2023-11-07 15:13:37.511025
 (29604,13) │ 3095917833 │    2 │ 3095917890 │    2 │ 431571 │ 2023-11-07 15:14:40.058367
(4 rows)
#v-

In logs on bionic/pg12 I found when this particular table was added to subscription:

#v+
2023-11-07 15:10:19.023 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,29,"idle",2023-11-07 15:10:18
UTC,30/865502097,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c203s15745.a_g_c_r_2023_10;",,,,,,,,,"psql"
2023-11-07 15:10:19.026 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,30,"idle",2023-11-07 15:10:18
UTC,30/865502098,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15311.q_q;",,,,,,,,,"psql"
 
2023-11-07 15:10:19.032 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,31,"idle",2023-11-07 15:10:18
UTC,30/865502099,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY c203s16352.m_2023_06;",,,,,,,,,"psql"
 
2023-11-07 15:10:19.035 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,32,"idle",2023-11-07 15:10:18
UTC,30/865502100,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c203s15531.conversation_messages;",,,,,,,,,"psql"
2023-11-07 15:10:19.039 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,33,"idle",2023-11-07 15:10:18
UTC,30/865502101,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c203s15479.q_s_e_2023_10;",,,,,,,,,"psql"
2023-11-07 15:10:19.043 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,34,"idle",2023-11-07 15:10:18
UTC,30/865502102,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c203s17062.wiki_pages;",,,,,,,,,"psql"
2023-11-07 15:10:19.046 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,35,"idle",2023-11-07 15:10:18
UTC,30/865502103,0,LOG,00000,"statement:ALTER PUBLICATION focal14 ADD TABLE ONLY
c203s15841.a_g_c_r_2023_10;",,,,,,,,,"psql"
#v-

Since it was previously asked - schema:

#v+
        Column         │            Type             │ Collation │ Nullable │                      Default

───────────────────────┼─────────────────────────────┼───────────┼──────────┼───────────────────────────────────────────────────
 id                    │ bigint                      │           │ not null │
nextval('conversation_messages_id_seq'::regclass)
 ..................... │ bigint                      │           │          │
 ..................... │ bigint                      │           │          │
 created_at            │ timestamp without time zone │           │          │
 ..................... │ boolean                     │           │          │
 ..................... │ text                        │           │          │
 ..................... │ text                        │           │          │
 ..................... │ character varying(255)      │           │          │
 ..................... │ character varying(255)      │           │          │
 ..................... │ bigint                      │           │          │
 ..................... │ character varying(255)      │           │          │
 ..................... │ bigint                      │           │          │
 ..................... │ character varying(255)      │           │          │
 ..................... │ text                        │           │          │
 ..................... │ boolean                     │           │          │
 ..................... │ boolean                     │           │          │
 ..................... │ text                        │           │          │
Indexes:
    "conversation_messages_pkey" PRIMARY KEY, btree (id), tablespace "data1"
    "index_conversation_messages_on_...." btree (.........), tablespace "data1"
    "index_conversation_messages_on_......_and_created_at" btree (..........., created_at), tablespace "data1"
Foreign-key constraints:
    "fk_rails_c3c322d882" FOREIGN KEY (conversation_id) REFERENCES conversations(id)
Referenced by:
    TABLE "conversation_message_participants" CONSTRAINT "fk_rails_992a8b9e13" FOREIGN KEY (conversation_message_id)
REFERENCESconversation_messages(id)
 
    TABLE "conversation_batches" CONSTRAINT "fk_rails_d421fc74f4" FOREIGN KEY (root_conversation_message_id) REFERENCES
conversation_messages(id)
Publications:
    "cdc"
    "focal14"
Triggers:
    guard_excessive_deletes AFTER DELETE ON conversation_messages REFERENCING OLD TABLE AS oldtbl FOR EACH STATEMENT
EXECUTEFUNCTION guard_excessive_updates()
 
    guard_excessive_updates AFTER UPDATE ON conversation_messages REFERENCING OLD TABLE AS oldtbl FOR EACH STATEMENT
EXECUTEFUNCTION guard_excessive_updates()
 
#v-

There are no dropped attributes.

Stats:

#v+
Pg12# select * from pg_stat_user_tables  where relid = 'c203s15531.conversation_messages'::regclass \gx
─[ RECORD 1 ]───────┬──────────────────────────────
relid               │ 166420
schemaname          │ c203s15531
relname             │ conversation_messages
seq_scan            │ 14
seq_tup_read        │ 2595888
idx_scan            │ 16059568
idx_tup_fetch       │ 27426090
n_tup_ins           │ 372696
n_tup_upd           │ 38
n_tup_del           │ 0
n_tup_hot_upd       │ 0
n_live_tup          │ 432630
n_dead_tup          │ 0
n_mod_since_analyze │ 7034
last_vacuum         │ [null]
last_autovacuum     │ 2023-08-10 18:06:11.655573+00
last_analyze        │ [null]
last_autoanalyze    │ 2023-11-01 18:56:52.868395+00
vacuum_count        │ 0
autovacuum_count    │ 5
analyze_count       │ 0
autoanalyze_count   │ 91
#v-

I can leave it as is for, let's say, a day. So if there are any queries/data
you'd like me to find out, let me know, but afterwards I will have to rebuild
the focal from scratch.

For whatever it's worth, we're adding tables to publication using:

#v+
with base as (
    select n.nspname, c.relname, c.relpages
    from pg_class c join pg_namespace n on c.relnamespace = n.oid
        left outer join pg_catalog.pg_publication_tables pt on pt.pubname = 'focal14' and pt.schemaname = n.nspname and
pt.tablename= c.relname
 
    where c.relkind in ('p', 'r') and n.nspname ~ 'schemas|we|care|about' and pt.pubname is null
)
select format('ALTER PUBLICATION focal14 ADD TABLE ONLY %I.%I;', nspname, relname)
from base
order by relpages desc
limit 40 \gexec
#v-

Best regards,

depesz



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Manika Singhal
Дата:
Сообщение: Re: BUG #18185: Error when calling whoami at the beginning of the installation
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Windows compatibility