Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
От | Amit Kapila |
---|---|
Тема | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Дата | |
Msg-id | CAA4eK1+Kn2yxHo5ZZ+A8jkkmGu8MP+Ab0ab7jtiz3gjGxWxGOw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions (Dilip Kumar <dilipbalaut@gmail.com>) |
Ответы |
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
(Dilip Kumar <dilipbalaut@gmail.com>)
|
Список | pgsql-hackers |
On Mon, Jul 6, 2020 at 11:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jul 6, 2020 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > 10. I have got the below failure once. I have not investigated this > > > > in detail as the patch is still under progress. See, if you have any > > > > idea? > > > > # Failed test 'check extra columns contain local defaults' > > > > # at t/013_stream_subxact_ddl_abort.pl line 81. > > > > # got: '2|0' > > > > # expected: '1000|500' > > > > # Looks like you failed 1 test of 2. > > > > make[2]: *** [check] Error 1 > > > > make[1]: *** [check-subscription-recurse] Error 2 > > > > make[1]: *** Waiting for unfinished jobs.... > > > > make: *** [check-world-src/test-recurse] Error 2 > > > > > > Even I got the failure once and after that, it did not reproduce. I > > > have executed it multiple time but it did not reproduce again. Are > > > you able to reproduce it consistently? > > > > > > > No, I am also not able to reproduce it consistently but I think this > > can fail if a subscriber sends the replay_location before actually > > replaying the changes. First, I thought that extra send_feedback we > > have in apply_handle_stream_commit might have caused this but I guess > > that can't happen because we need the commit time location for that > > and we are storing the same at the end of apply_handle_stream_commit > > after applying all messages. I am not sure what is going on here. I > > think we somehow need to reproduce this or some variant of this test > > consistently to find the root cause. > > And I think it appeared first time for me, so maybe either induced > from past few versions so some changes in the last few versions might > have exposed it. I have noticed that almost 50% of the time I am able > to reproduce after the clean build so I can trace back from which > version it started appearing that way it will be easy to narrow down. > One more comment ReorderBufferLargestTopTXN { .. dlist_foreach(iter, &rb->toplevel_by_lsn) { ReorderBufferTXN *txn; + Size size = 0; + Size largest_size = 0; txn = dlist_container(ReorderBufferTXN, node, iter.cur); - /* if the current transaction is larger, remember it */ - if ((!largest) || (txn->size > largest->size)) + /* + * If this transaction have some incomplete changes then only consider + * the size upto last complete lsn. + */ + if (rbtxn_has_incomplete_tuple(txn)) + size = txn->complete_size; + else + size = txn->total_size; + + /* If the current transaction is larger then remember it. */ + if ((largest != NULL || size > largest_size) && size > 0) Here largest_size is a local variable inside the loop which is initialized to 0 in each iteration and that will lead to picking each next txn as largest. This seems wrong to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: