RE: Logical replication timeout
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: Logical replication timeout |
Дата | |
Msg-id | OSCPR01MB14966B646506E0C9B81B3A4CFF5022@OSCPR01MB14966.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Logical replication timeout (RECHTÉ Marc <marc.rechte@meteo.fr>) |
Список | pgsql-hackers |
Dear Shlok, > > Thanks for sharing the analysis. > > I tested the patch on my machine as well and it has worse performance > for me as well. > I came up with an alternate approach. In this approach we keep track > of wal segment the transaction is part of. This helps to iterate > through only required files during clean up. > > On my machine, I am running the testcase provided by you in [1]. It is > generating ~1.9 million spill files. For me the transaction completed > in 56sec. > Cleanup (deletion of spill files) took around following time: > With HEAD : ~ 5min > With latest patch (attached here) : ~2min > > Can you test if this improves performance for you? I'm also not sure the performance, but I can post my comments. I'm not sure your patch can properly handle the list operations. ``` + oldcontext = MemoryContextSwitchTo(rb->context); + txn->walsgmts = lappend(txn->walsgmts, curOpenSegNo); + MemoryContextSwitchTo(oldcontext); + ``` IIUC lappend() accepts a point of a Datum, but here a normal value is passed. Should we define a new struct which represents a node of list and append it after it is palloc()'d? Or your code is enough for some reasons? ``` /* iterate over all possible filenames, and delete them */ - for (cur = first; cur <= last; cur++) + foreach(cell, txn->walsgmts) { + XLogSegNo curr_segno = (XLogSegNo) lfirst(cell); char path[MAXPGPATH]; - ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, cur); + ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, curr_segno); if (unlink(path) != 0 && errno != ENOENT) ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); } + + if(txn->walsgmts != NIL) + { + pfree(txn->walsgmts); + txn->walsgmts = NIL; + } ``` If above comment is accepted, I feel you can use foreach_delete_current(). ======= Also, even when we optimize the truncation of files, there is a possibility that replication is timed out. Can you also create a patch which implements [1]? [1]: https://www.postgresql.org/message-id/CAExHW5s2_T9mULDQRKsdV72wpnA%2BNLT63cX51b51QQVEV4sG5g%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED
В списке pgsql-hackers по дате отправления: