Обсуждение: Forget close an open relation in ReorderBufferProcessTXN()
The RelationIdGetRelation() comment says: > Caller should eventually decrement count. (Usually, > that happens by calling RelationClose().) However, it doesn't do it in ReorderBufferProcessTXN(). I think we should close it, here is a patch that fixes it. Thoughts? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 52d06285a2..aac6ffc602 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -2261,7 +2261,10 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, elog(ERROR, "could not open relation with OID %u", relid); if (!RelationIsLogicallyLogged(relation)) + { + RelationClose(relation); continue; + } relations[nrelations++] = relation; }
On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: > > The RelationIdGetRelation() comment says: > > > Caller should eventually decrement count. (Usually, > > that happens by calling RelationClose().) > > However, it doesn't do it in ReorderBufferProcessTXN(). > I think we should close it, here is a patch that fixes it. Thoughts? > +1. Your fix looks correct to me but can we test it in some way? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: >> >> The RelationIdGetRelation() comment says: >> > Caller should eventually decrement count. (Usually, > that happens by calling RelationClose().) >> >> However, it doesn't do it in ReorderBufferProcessTXN(). >> I think we should close it, here is a patch that fixes it. Thoughts? >> > +1. Your fix looks correct to me but can we test it in some way? I think this code has a bigger problem: it should not be using RelationIdGetRelation and RelationClose directly. 99.44% of the backend goes through relation_open or one of the other relation.c wrappers, so why doesn't this? Possibly the answer is "it copied the equally misguided code in pgoutput.c". A quick grep shows nothing else doing it this way. regards, tom lane
On Thu, Apr 15, 2021 at 10:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: > >> > >> The RelationIdGetRelation() comment says: > >> > > Caller should eventually decrement count. (Usually, > > that happens by calling RelationClose().) > >> > >> However, it doesn't do it in ReorderBufferProcessTXN(). > >> I think we should close it, here is a patch that fixes it. Thoughts? > >> > > > +1. Your fix looks correct to me but can we test it in some way? > > I think this code has a bigger problem: it should not be using > RelationIdGetRelation and RelationClose directly. 99.44% of > the backend goes through relation_open or one of the other > relation.c wrappers, so why doesn't this? > I think it is because relation_open expects either caller to have a lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't need to acquire a lock on relation while decoding changes from WAL because it uses a historic snapshot to build a relcache entry and all the later changes to the rel are absorbed while decoding WAL. I think it is also important to *not* acquire any lock on relation otherwise it can lead to some sort of deadlock or infinite wait in the decoding process. Consider a case for operations like Truncate (or if the user has acquired an exclusive lock on the relation in some other way say via Lock command) which acquires an exclusive lock on relation, it won't get replicated in synchronous mode (when synchronous_standby_name is configured). The truncate operation will wait for the transaction to be replicated to the subscriber and the decoding process will wait for the Truncate operation to finish. > Possibly the answer is "it copied the equally misguided code > in pgoutput.c". > I think it is following what is done during decoding, otherwise, it will lead to the problems as described above. We are already discussing one of the similar problems [1] where pgoutput unintentionally acquired a lock on the index and lead to a sort of deadlock. If the above understanding is correct, I think we might want to improve comments in this area. [1] - https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Hi, On 2021-04-16 08:42:40 +0530, Amit Kapila wrote: > I think it is because relation_open expects either caller to have a > lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't > need to acquire a lock on relation while decoding changes from WAL > because it uses a historic snapshot to build a relcache entry and all > the later changes to the rel are absorbed while decoding WAL. Right. > I think it is also important to *not* acquire any lock on relation > otherwise it can lead to some sort of deadlock or infinite wait in the > decoding process. Consider a case for operations like Truncate (or if > the user has acquired an exclusive lock on the relation in some other > way say via Lock command) which acquires an exclusive lock on > relation, it won't get replicated in synchronous mode (when > synchronous_standby_name is configured). The truncate operation will > wait for the transaction to be replicated to the subscriber and the > decoding process will wait for the Truncate operation to finish. However, this cannot be really relied upon for catalog tables. An output function might acquire locks or such. But for those we do not need to decode contents... This made me take a brief look at pgoutput.c - maybe I am missing something, but how is the following not a memory leak? static void maybe_send_schema(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, ReorderBufferChange *change, Relation relation, RelationSyncEntry *relentry) { ... /* Map must live as long as the session does. */ oldctx = MemoryContextSwitchTo(CacheMemoryContext); relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), CreateTupleDescCopy(outdesc)); MemoryContextSwitchTo(oldctx); send_relation_and_attrs(ancestor, xid, ctx); RelationClose(ancestor); If - and that's common - convert_tuples_by_name() won't have to do anything, the copied tuple descs will be permanently leaked. Greetings, Andres Freund
On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote: > > > I think it is also important to *not* acquire any lock on relation > > otherwise it can lead to some sort of deadlock or infinite wait in the > > decoding process. Consider a case for operations like Truncate (or if > > the user has acquired an exclusive lock on the relation in some other > > way say via Lock command) which acquires an exclusive lock on > > relation, it won't get replicated in synchronous mode (when > > synchronous_standby_name is configured). The truncate operation will > > wait for the transaction to be replicated to the subscriber and the > > decoding process will wait for the Truncate operation to finish. > > However, this cannot be really relied upon for catalog tables. An output > function might acquire locks or such. But for those we do not need to > decode contents... > True, so, if we don't need to decode contents then we won't have the problems of the above kind. > > > This made me take a brief look at pgoutput.c - maybe I am missing > something, but how is the following not a memory leak? > > static void > maybe_send_schema(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, ReorderBufferChange *change, > Relation relation, RelationSyncEntry *relentry) > { > ... > /* Map must live as long as the session does. */ > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), > CreateTupleDescCopy(outdesc)); > MemoryContextSwitchTo(oldctx); > send_relation_and_attrs(ancestor, xid, ctx); > RelationClose(ancestor); > > If - and that's common - convert_tuples_by_name() won't have to do > anything, the copied tuple descs will be permanently leaked. > I also think this is a permanent leak. I think we need to free all the memory associated with this map on the invalidation of this particular relsync entry (basically in rel_sync_cache_relation_cb). -- With Regards, Amit Kapila.
On Thu, Apr 15, 2021 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: > > > > The RelationIdGetRelation() comment says: > > > > > Caller should eventually decrement count. (Usually, > > > that happens by calling RelationClose().) > > > > However, it doesn't do it in ReorderBufferProcessTXN(). > > I think we should close it, here is a patch that fixes it. Thoughts? > > > > +1. Your fix looks correct to me but can we test it in some way? > I have tried to find a test but not able to find one. I have tried with a foreign table but we don't log truncate for it, see ExecuteTruncate. It has a check that it will log for relids where RelationIsLogicallyLogged. If that is the case, it is not clear to me how we can ever hit this condition? Have you tried to find the test? -- With Regards, Amit Kapila.
On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote: > > > > I think it is also important to *not* acquire any lock on relation > > otherwise it can lead to some sort of deadlock or infinite wait in the > > decoding process. Consider a case for operations like Truncate (or if > > the user has acquired an exclusive lock on the relation in some other > > way say via Lock command) which acquires an exclusive lock on > > relation, it won't get replicated in synchronous mode (when > > synchronous_standby_name is configured). The truncate operation will > > wait for the transaction to be replicated to the subscriber and the > > decoding process will wait for the Truncate operation to finish. > > However, this cannot be really relied upon for catalog tables. An output > function might acquire locks or such. But for those we do not need to > decode contents... > I see that if we define a user_catalog_table (create table t1_cat(c1 int) WITH(user_catalog_table = true);), we are able to decode operations like (insert, truncate) on such a table. What do you mean by "But for those we do not need to decode contents"? -- With Regards, Amit Kapila.
On Sat, 17 Apr 2021 at 14:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Apr 15, 2021 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: >> > >> > The RelationIdGetRelation() comment says: >> > >> > > Caller should eventually decrement count. (Usually, >> > > that happens by calling RelationClose().) >> > >> > However, it doesn't do it in ReorderBufferProcessTXN(). >> > I think we should close it, here is a patch that fixes it. Thoughts? >> > >> >> +1. Your fix looks correct to me but can we test it in some way? >> > > I have tried to find a test but not able to find one. I have tried > with a foreign table but we don't log truncate for it, see > ExecuteTruncate. It has a check that it will log for relids where > RelationIsLogicallyLogged. If that is the case, it is not clear to me > how we can ever hit this condition? Have you tried to find the test? I also don't find a test for this. It is introduced in 5dfd1e5a6696, wrote by Simon Riggs, Marco Nenciarini and Peter Eisentraut. Maybe they can explain when we can enter this condition? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
On Sat, Apr 17, 2021 at 12:05 PM Japin Li <japinli@hotmail.com> wrote: > > On Sat, 17 Apr 2021 at 14:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 15, 2021 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: > >> > > >> > The RelationIdGetRelation() comment says: > >> > > >> > > Caller should eventually decrement count. (Usually, > >> > > that happens by calling RelationClose().) > >> > > >> > However, it doesn't do it in ReorderBufferProcessTXN(). > >> > I think we should close it, here is a patch that fixes it. Thoughts? > >> > > >> > >> +1. Your fix looks correct to me but can we test it in some way? > >> > > > > I have tried to find a test but not able to find one. I have tried > > with a foreign table but we don't log truncate for it, see > > ExecuteTruncate. It has a check that it will log for relids where > > RelationIsLogicallyLogged. If that is the case, it is not clear to me > > how we can ever hit this condition? Have you tried to find the test? > > I also don't find a test for this. It is introduced in 5dfd1e5a6696, > wrote by Simon Riggs, Marco Nenciarini and Peter Eisentraut. Maybe they > can explain when we can enter this condition? > My guess is that this has been copied from the code a few lines above to handle insert/update/delete where it is required to handle some DDL ops like Alter Table but I think we don't need it here (for Truncate op). If that understanding turns out to be true then we should either have an Assert for this or an elog message. -- With Regards, Amit Kapila.
On Sat, Apr 17, 2021 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > > I think it is also important to *not* acquire any lock on relation > > > otherwise it can lead to some sort of deadlock or infinite wait in the > > > decoding process. Consider a case for operations like Truncate (or if > > > the user has acquired an exclusive lock on the relation in some other > > > way say via Lock command) which acquires an exclusive lock on > > > relation, it won't get replicated in synchronous mode (when > > > synchronous_standby_name is configured). The truncate operation will > > > wait for the transaction to be replicated to the subscriber and the > > > decoding process will wait for the Truncate operation to finish. > > > > However, this cannot be really relied upon for catalog tables. An output > > function might acquire locks or such. But for those we do not need to > > decode contents... > > > > I see that if we define a user_catalog_table (create table t1_cat(c1 > int) WITH(user_catalog_table = true);), we are able to decode > operations like (insert, truncate) on such a table. What do you mean > by "But for those we do not need to decode contents"? > I think we are allowed to decode the operations on user catalog tables because we are using RelationIsLogicallyLogged() instead of RelationIsAccessibleInLogicalDecoding() in ReorderBufferProcessTXN(). Based on this discussion, I think we should not be allowing decoding of operations on user catalog tables, so we should use RelationIsAccessibleInLogicalDecoding to skip such ops in ReorderBufferProcessTXN(). Am, I missing something? Can you please clarify? -- With Regards, Amit Kapila.
On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote:> > This made me take a brief look at pgoutput.c- maybe I am missing > > something, but how is the following not a memory leak? > > > > static void > > maybe_send_schema(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn, ReorderBufferChange *change, > > Relation relation, RelationSyncEntry *relentry) > > { > > ... > > /* Map must live as long as the session does. */ > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), > > CreateTupleDescCopy(outdesc)); > > MemoryContextSwitchTo(oldctx); > > send_relation_and_attrs(ancestor, xid, ctx); > > RelationClose(ancestor); > > > > If - and that's common - convert_tuples_by_name() won't have to do > > anything, the copied tuple descs will be permanently leaked. > > > > I also think this is a permanent leak. I think we need to free all the > memory associated with this map on the invalidation of this particular > relsync entry (basically in rel_sync_cache_relation_cb). I agree there's a problem here. Back in: https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTPU0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com I had proposed to move the map creation from maybe_send_schema() to get_rel_sync_entry(), mainly because the latter is where I realized it belongs, though a bit too late. Attached is the part of the patch for this particular issue. It also takes care to release the copied TupleDescs if the map is found to be unnecessary, thus preventing leaking into CacheMemoryContext. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Saturday, April 17, 2021 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Apr 17, 2021 at 12:05 PM Japin Li <japinli@hotmail.com> wrote: > > > > On Sat, 17 Apr 2021 at 14:09, Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > On Thu, Apr 15, 2021 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > >> > > >> On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: > > >> > > > >> > The RelationIdGetRelation() comment says: > > >> > > > >> > > Caller should eventually decrement count. (Usually, that > > >> > > happens by calling RelationClose().) > > >> > > > >> > However, it doesn't do it in ReorderBufferProcessTXN(). > > >> > I think we should close it, here is a patch that fixes it. Thoughts? > > >> > > > >> > > >> +1. Your fix looks correct to me but can we test it in some way? > > >> > > > > > > I have tried to find a test but not able to find one. I have tried > > > with a foreign table but we don't log truncate for it, see > > > ExecuteTruncate. It has a check that it will log for relids where > > > RelationIsLogicallyLogged. If that is the case, it is not clear to > > > me how we can ever hit this condition? Have you tried to find the test? > > > > I also don't find a test for this. It is introduced in 5dfd1e5a6696, > > wrote by Simon Riggs, Marco Nenciarini and Peter Eisentraut. Maybe > > they can explain when we can enter this condition? > > My guess is that this has been copied from the code a few lines above to > handle insert/update/delete where it is required to handle some DDL ops like > Alter Table but I think we don't need it here (for Truncate op). If that > understanding turns out to be true then we should either have an Assert for > this or an elog message. In this thread, we are discussing 3 topics below... (1) necessity of the check for REORDER_BUFFER_CHANGE_TRUNCATE in ReorderBufferProcessTXN() (2) discussion of whether we disallow decoding of operations on user catalog tables or not (3) memory leak of maybe_send_schema() (patch already provided) Let's address those one by one. In terms of (1), which was close to the motivation of this thread, first of all, I traced the truncate processing and I think the check is done by truncate command side as well. I preferred Assert rather than never called elog, but it's OK to choose elog if someone has strong opinion on it. Attached the patch for this. Best Regards, Takamichi Osumi
Вложения
On Fri, 23 Apr 2021 at 22:33, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Saturday, April 17, 2021 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sat, Apr 17, 2021 at 12:05 PM Japin Li <japinli@hotmail.com> wrote: >> > >> > On Sat, 17 Apr 2021 at 14:09, Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> > > On Thu, Apr 15, 2021 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> > >> >> > >> On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote: >> > >> > >> > >> > The RelationIdGetRelation() comment says: >> > >> > >> > >> > > Caller should eventually decrement count. (Usually, that >> > >> > > happens by calling RelationClose().) >> > >> > >> > >> > However, it doesn't do it in ReorderBufferProcessTXN(). >> > >> > I think we should close it, here is a patch that fixes it. Thoughts? >> > >> > >> > >> >> > >> +1. Your fix looks correct to me but can we test it in some way? >> > >> >> > > >> > > I have tried to find a test but not able to find one. I have tried >> > > with a foreign table but we don't log truncate for it, see >> > > ExecuteTruncate. It has a check that it will log for relids where >> > > RelationIsLogicallyLogged. If that is the case, it is not clear to >> > > me how we can ever hit this condition? Have you tried to find the test? >> > >> > I also don't find a test for this. It is introduced in 5dfd1e5a6696, >> > wrote by Simon Riggs, Marco Nenciarini and Peter Eisentraut. Maybe >> > they can explain when we can enter this condition? >> >> My guess is that this has been copied from the code a few lines above to >> handle insert/update/delete where it is required to handle some DDL ops like >> Alter Table but I think we don't need it here (for Truncate op). If that >> understanding turns out to be true then we should either have an Assert for >> this or an elog message. > In this thread, we are discussing 3 topics below... > > (1) necessity of the check for REORDER_BUFFER_CHANGE_TRUNCATE in ReorderBufferProcessTXN() > (2) discussion of whether we disallow decoding of operations on user catalog tables or not > (3) memory leak of maybe_send_schema() (patch already provided) > > Let's address those one by one. > In terms of (1), which was close to the motivation of this thread, > first of all, I traced the truncate processing > and I think the check is done by truncate command side as well. > I preferred Assert rather than never called elog, > but it's OK to choose elog if someone has strong opinion on it. > Attached the patch for this. > +1, make check-world passed. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
On Fri, Apr 23, 2021 at 8:03 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Saturday, April 17, 2021 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I also don't find a test for this. It is introduced in 5dfd1e5a6696, > > > wrote by Simon Riggs, Marco Nenciarini and Peter Eisentraut. Maybe > > > they can explain when we can enter this condition? > > > > My guess is that this has been copied from the code a few lines above to > > handle insert/update/delete where it is required to handle some DDL ops like > > Alter Table but I think we don't need it here (for Truncate op). If that > > understanding turns out to be true then we should either have an Assert for > > this or an elog message. > In this thread, we are discussing 3 topics below... > > (1) necessity of the check for REORDER_BUFFER_CHANGE_TRUNCATE in ReorderBufferProcessTXN() > (2) discussion of whether we disallow decoding of operations on user catalog tables or not > (3) memory leak of maybe_send_schema() (patch already provided) > > Let's address those one by one. > In terms of (1), which was close to the motivation of this thread, > I think (1) and (2) are related because if we need (2) then the check removed by (1) needs to be replaced with another check. So, I am not sure how to make this decision. -- With Regards, Amit Kapila.
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Tuesday, April 20, 2021 12:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> > > wrote:> > This made me take a brief look at pgoutput.c - maybe I am > > missing > > > something, but how is the following not a memory leak? > > > > > > static void > > > maybe_send_schema(LogicalDecodingContext *ctx, > > > ReorderBufferTXN *txn, ReorderBufferChange > *change, > > > Relation relation, RelationSyncEntry *relentry) { > > > ... > > > /* Map must live as long as the session does. */ > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > relentry->map = > convert_tuples_by_name(CreateTupleDescCopy(indesc), > > > > CreateTupleDescCopy(outdesc)); > > > MemoryContextSwitchTo(oldctx); > > > send_relation_and_attrs(ancestor, xid, ctx); > > > RelationClose(ancestor); > > > > > > If - and that's common - convert_tuples_by_name() won't have to do > > > anything, the copied tuple descs will be permanently leaked. > > > > > > > I also think this is a permanent leak. I think we need to free all the > > memory associated with this map on the invalidation of this particular > > relsync entry (basically in rel_sync_cache_relation_cb). > > I agree there's a problem here. > > Back in: > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP > U0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > I had proposed to move the map creation from maybe_send_schema() to > get_rel_sync_entry(), mainly because the latter is where I realized it > belongs, though a bit too late. Attached is the part of the patch > for this particular issue. It also takes care to release the copied TupleDescs > if the map is found to be unnecessary, thus preventing leaking into > CacheMemoryContext. Thank you for sharing the patch. Your patch looks correct to me. Make check-world has passed with it. Also, I agree with the idea to place the processing to set the map in the get_rel_sync_entry. One thing I'd like to ask is an advanced way to confirm the memory leak is solved by the patch, not just by running make check-world. I used OSS HEAD and valgrind, expecting that I could see function stack which has a call of CreateTupleDescCopy from both pgoutput_change and pgoutput_truncate as memory leak report in the valgrind logs, and they disappear after applying the patch. But, I cannot find the pair of pgoutput functions and CreateTupleDescCopy in one report when I used OSS HEAD, which means that I need to do advanced testing to check if the memory leak of CreateTupleDescCopy is addressed. I collected the logs from RT at src/test/subscription so should pass the routes of our interest. Could someone give me an advise about the way to confirm the memory leak is solved ? Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Monday, April 26, 2021 2:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Apr 23, 2021 at 8:03 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > On Saturday, April 17, 2021 4:13 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > > I also don't find a test for this. It is introduced in > > > > 5dfd1e5a6696, wrote by Simon Riggs, Marco Nenciarini and Peter > > > > Eisentraut. Maybe they can explain when we can enter this condition? > > > > > > My guess is that this has been copied from the code a few lines > > > above to handle insert/update/delete where it is required to handle > > > some DDL ops like Alter Table but I think we don't need it here (for > > > Truncate op). If that understanding turns out to be true then we > > > should either have an Assert for this or an elog message. > > In this thread, we are discussing 3 topics below... > > > > (1) necessity of the check for REORDER_BUFFER_CHANGE_TRUNCATE > in > > ReorderBufferProcessTXN() > > (2) discussion of whether we disallow decoding of operations on user > > catalog tables or not > > (3) memory leak of maybe_send_schema() (patch already provided) > > > > Let's address those one by one. > > In terms of (1), which was close to the motivation of this thread, > > > > I think (1) and (2) are related because if we need (2) then the check removed > by (1) needs to be replaced with another check. So, I am not sure how to > make this decision. Yeah, you are right. On Monday, April 19, 2021 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Apr 17, 2021 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> > wrote: > > > > > > > I think it is also important to *not* acquire any lock on relation > > > > otherwise it can lead to some sort of deadlock or infinite wait in > > > > the decoding process. Consider a case for operations like Truncate > > > > (or if the user has acquired an exclusive lock on the relation in > > > > some other way say via Lock command) which acquires an exclusive > > > > lock on relation, it won't get replicated in synchronous mode > > > > (when synchronous_standby_name is configured). The truncate > > > > operation will wait for the transaction to be replicated to the > > > > subscriber and the decoding process will wait for the Truncate > operation to finish. > > > > > > However, this cannot be really relied upon for catalog tables. An > > > output function might acquire locks or such. But for those we do not > > > need to decode contents... > > > > > > > I see that if we define a user_catalog_table (create table t1_cat(c1 > > int) WITH(user_catalog_table = true);), we are able to decode > > operations like (insert, truncate) on such a table. What do you mean > > by "But for those we do not need to decode contents"? > > > > I think we are allowed to decode the operations on user catalog tables > because we are using RelationIsLogicallyLogged() instead of > RelationIsAccessibleInLogicalDecoding() in ReorderBufferProcessTXN(). > Based on this discussion, I think we should not be allowing decoding of > operations on user catalog tables, so we should use > RelationIsAccessibleInLogicalDecoding to skip such ops in > ReorderBufferProcessTXN(). Am, I missing something? > > Can you please clarify? I don't understand that point, either. I read the context where the user_catalog_table was introduced - [1]. There, I couldn't find any discussion if we should skip decode operations on that kind of tables or not. Accordingly, we just did not conclude it, I suppose. What surprised me a bit is to decode operations of system catalog table are considered like [2] somehow at the time. I cannot find any concrete description of such use cases in the thread, though. Anyway, I felt disallowing decoding of operations on user catalog tables doesn't spoil the feature's purpose. So, I'm OK to do so. What do you think ? [1] - https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de Note that in this discussion, user_catalog_table was renamed from treat_as_catalog_table in the middle of the thread. Searching it might help you to shorten your time to have a look at it. [2] - https://www.postgresql.org/message-id/CA%2BTgmobhDCHuckL_86wRDWJ31Gw3Y1HrQ4yUKEn7U1_hTbeVqQ%40mail.gmail.com Best Regards, Takamichi Osumi
On Wed, Apr 28, 2021 at 5:36 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Monday, April 26, 2021 2:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 8:03 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > I think we are allowed to decode the operations on user catalog tables > > because we are using RelationIsLogicallyLogged() instead of > > RelationIsAccessibleInLogicalDecoding() in ReorderBufferProcessTXN(). > > Based on this discussion, I think we should not be allowing decoding of > > operations on user catalog tables, so we should use > > RelationIsAccessibleInLogicalDecoding to skip such ops in > > ReorderBufferProcessTXN(). Am, I missing something? > > > > Can you please clarify? > I don't understand that point, either. > > I read the context where the user_catalog_table was introduced - [1]. > There, I couldn't find any discussion if we should skip decode operations > on that kind of tables or not. Accordingly, we just did not conclude it, I suppose. > > What surprised me a bit is to decode operations of system catalog table are considered like [2] > somehow at the time. I cannot find any concrete description of such use cases in the thread, though. > > Anyway, I felt disallowing decoding of operations on user catalog tables > doesn't spoil the feature's purpose. So, I'm OK to do so. What do you think ? > I am not so sure about it because I think we don't have any example of user_catalog_tables in the core code. This is the reason I was kind of looking towards Andres to clarify this. Right now, if the user performs TRUNCATE on user_catalog_table in synchronous mode then it will hang in case the decoding plugin takes even share lock on it. The main reason is that we allow decoding of TRUNCATE operation for user_catalog_tables. I think even if we want to allow decoding of other operations on user_catalog_table, the decoding of TRUNCATE should be prohibited but maybe we shouldn't allow decoding of any operation on such tables as we don't do it for system catalog tables. -- With Regards, Amit Kapila.
Takamichi-san, On Tue, Apr 27, 2021 at 9:37 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Tuesday, April 20, 2021 12:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> > > > wrote:> > This made me take a brief look at pgoutput.c - maybe I am > > > missing > > > > something, but how is the following not a memory leak? > > > > > > > > static void > > > > maybe_send_schema(LogicalDecodingContext *ctx, > > > > ReorderBufferTXN *txn, ReorderBufferChange > > *change, > > > > Relation relation, RelationSyncEntry *relentry) { > > > > ... > > > > /* Map must live as long as the session does. */ > > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > > relentry->map = > > convert_tuples_by_name(CreateTupleDescCopy(indesc), > > > > > > CreateTupleDescCopy(outdesc)); > > > > MemoryContextSwitchTo(oldctx); > > > > send_relation_and_attrs(ancestor, xid, ctx); > > > > RelationClose(ancestor); > > > > > > > > If - and that's common - convert_tuples_by_name() won't have to do > > > > anything, the copied tuple descs will be permanently leaked. > > > > > > > > > > I also think this is a permanent leak. I think we need to free all the > > > memory associated with this map on the invalidation of this particular > > > relsync entry (basically in rel_sync_cache_relation_cb). > > > > I agree there's a problem here. > > > > Back in: > > > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP > > U0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > > > I had proposed to move the map creation from maybe_send_schema() to > > get_rel_sync_entry(), mainly because the latter is where I realized it > > belongs, though a bit too late. Attached is the part of the patch > > for this particular issue. It also takes care to release the copied TupleDescs > > if the map is found to be unnecessary, thus preventing leaking into > > CacheMemoryContext. > > Thank you for sharing the patch. > Your patch looks correct to me. Make check-world has > passed with it. Also, I agree with the idea to place > the processing to set the map in the get_rel_sync_entry. Thanks for checking. > One thing I'd like to ask is an advanced way to confirm > the memory leak is solved by the patch, not just by running make check-world. > > I used OSS HEAD and valgrind, expecting that > I could see function stack which has a call of CreateTupleDescCopy > from both pgoutput_change and pgoutput_truncate as memory leak report > in the valgrind logs, and they disappear after applying the patch. > > But, I cannot find the pair of pgoutput functions and CreateTupleDescCopy in one report > when I used OSS HEAD, which means that I need to do advanced testing to check if > the memory leak of CreateTupleDescCopy is addressed. > I collected the logs from RT at src/test/subscription so should pass the routes of our interest. > > Could someone give me > an advise about the way to confirm the memory leak is solved ? I have not used valgrind or other testing methods to check this. To me, it's clear in this case by only looking at the code that the TupleDescs returned by CreateTupleDescCopy() ought to be freed when convert_tuples_by_name() determines that no map is necessary such that there will be no need to keep those TupleDesc copies around. Failing that, those copies end up in CacheMemoryContext without anything pointing to them, hence the leak. Actually, since maybe_send_schema() doesn't execute this code if schema_sent is found to have been set by earlier calls, the leak in question should occur only once in most tests. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Thursday, April 29, 2021 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I am not so sure about it because I think we don't have any example of > user_catalog_tables in the core code. This is the reason I was kind of looking > towards Andres to clarify this. Right now, if the user performs TRUNCATE on > user_catalog_table in synchronous mode then it will hang in case the > decoding plugin takes even share lock on it. The main reason is that we allow > decoding of TRUNCATE operation for user_catalog_tables. I think even if we > want to allow decoding of other operations on user_catalog_table, the > decoding of TRUNCATE should be prohibited but maybe we shouldn't allow > decoding of any operation on such tables as we don't do it for system catalog > tables. I tried the following scenarios for trying to reproduce this. Scenario1: (1) set up 1 publisher and 1 subscriber (2) create table with user_catalog_table = true on the pub (3) insert some data to this table (4) create publication for the table on the pub (5) create table with user_catalog_table = true on the sub (6) create subscription on the sub (7) add synchronous_standby_names to publisher's configuration and restart the pub (8) have 1 session to hold a lock to the user_catalog_table on the pub in access share mode (9) have another session to truncate the user_catalog_table on the pub Here, It keeps waiting but I'm not sure this is the scenario described above, since this deadlock is caused by (8)'s lock. Scenario2: (1) set up 1 publisher and 1 subscriber (2) create table with user_catalog_table = true on the pub (3) insert some data to this table (4) create publication for the table on the pub (5) create table with user_catalog_table = true on the sub (6) create subscription on the sub (7) add synchronous_standby_names to publisher's configuration and restart the pub (8) have a session to truncate the user_catalog_table on the pub Scenario 2 was successful. Are these the scenario you have in mind, if not please let me know for the missing steps. I would like to reproduce the scenario and write a patch to fix this. Best Regards, Takamichi Osumi
On Thu, May 13, 2021 at 11:15 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Thursday, April 29, 2021 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I am not so sure about it because I think we don't have any example of > > user_catalog_tables in the core code. This is the reason I was kind of looking > > towards Andres to clarify this. Right now, if the user performs TRUNCATE on > > user_catalog_table in synchronous mode then it will hang in case the > > decoding plugin takes even share lock on it. The main reason is that we allow > > decoding of TRUNCATE operation for user_catalog_tables. I think even if we > > want to allow decoding of other operations on user_catalog_table, the > > decoding of TRUNCATE should be prohibited but maybe we shouldn't allow > > decoding of any operation on such tables as we don't do it for system catalog > > tables. > > I tried the following scenarios for trying to reproduce this. > > Scenario1: > (1) set up 1 publisher and 1 subscriber > (2) create table with user_catalog_table = true on the pub > (3) insert some data to this table > (4) create publication for the table on the pub > (5) create table with user_catalog_table = true on the sub > (6) create subscription on the sub > (7) add synchronous_standby_names to publisher's configuration and restart the pub > (8) have 1 session to hold a lock to the user_catalog_table on the pub in access share mode > (9) have another session to truncate the user_catalog_table on the pub > > Here, It keeps waiting but I'm not sure this is the scenario described above, > since this deadlock is caused by (8)'s lock. > This is a lock time-out scenario, not a deadlock. > Scenario2: > (1) set up 1 publisher and 1 subscriber > (2) create table with user_catalog_table = true on the pub > (3) insert some data to this table > (4) create publication for the table on the pub > (5) create table with user_catalog_table = true on the sub > (6) create subscription on the sub > (7) add synchronous_standby_names to publisher's configuration and restart the pub > (8) have a session to truncate the user_catalog_table on the pub > > Scenario 2 was successful. > Yeah, because pgoutput or for that matter even test_decoding doesn't acquire a lock on user catalog tables. > Are these the scenario you have in mind, > if not please let me know for the missing steps. > I would like to reproduce the scenario and write a patch to fix this. > I don't think we can reproduce it with core plugins as they don't lock user catalog tables. We either need to write a minimal decoding plugin where we acquire a lock (maybe share lock) on the user catalog table or hack test_decoding/pgoutput to take such a lock. -- With Regards, Amit Kapila.
On Tue, Apr 20, 2021 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote:> > This made me take a brief look at pgoutput.c- maybe I am missing > > > something, but how is the following not a memory leak? > > > > > > static void > > > maybe_send_schema(LogicalDecodingContext *ctx, > > > ReorderBufferTXN *txn, ReorderBufferChange *change, > > > Relation relation, RelationSyncEntry *relentry) > > > { > > > ... > > > /* Map must live as long as the session does. */ > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), > > > CreateTupleDescCopy(outdesc)); > > > MemoryContextSwitchTo(oldctx); > > > send_relation_and_attrs(ancestor, xid, ctx); > > > RelationClose(ancestor); > > > > > > If - and that's common - convert_tuples_by_name() won't have to do > > > anything, the copied tuple descs will be permanently leaked. > > > > > > > I also think this is a permanent leak. I think we need to free all the > > memory associated with this map on the invalidation of this particular > > relsync entry (basically in rel_sync_cache_relation_cb). > > I agree there's a problem here. > > Back in: > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTPU0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > I had proposed to move the map creation from maybe_send_schema() to > get_rel_sync_entry(), mainly because the latter is where I realized it > belongs, though a bit too late. > It seems in get_rel_sync_entry, it will only build the map again when there is any invalidation in publication_rel. Don't we need to build it after any DDL on the relation itself? I haven't tried this with a test so I might be missing something. Also, don't we need to free the entire map as suggested by me? -- With Regards, Amit Kapila.
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Thursday, May 13, 2021 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Apr 20, 2021 at 8:36 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > Back in: > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP > U0L5%2 > > BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > > > I had proposed to move the map creation from maybe_send_schema() to > > get_rel_sync_entry(), mainly because the latter is where I realized it > > belongs, though a bit too late. > > It seems in get_rel_sync_entry, it will only build the map again when there is > any invalidation in publication_rel. Don't we need to build it after any DDL on > the relation itself? I haven't tried this with a test so I might be missing > something. Yeah, the patch not only tries to address the memory leak but also changes the timing (condition) to call convert_tuples_by_name. This is because the patch placed the function within a condition of !entry->replicate_valid in get_rel_sync_entry. OTOH, OSS HEAD calls it based on RelationSyncEntry's schema_sent in maybe_send_schema. The two flags (replicate_valid and schema_sent) are reset at different timing somehow. InvalidateSystemCaches resets both flags but schema_send is also reset by LocalExecuteInvalidationMessage while replicate_valid is reset by CallSyscacheCallbacks. IIUC, InvalidateSystemCaches, which applies to both flags, is called when a transaction starts, via AtStart_Cache and when a table lock is taken via LockRelationOid, etc. Accordingly, I think we can notice changes after any DDL on the relation. But, as for the different timing, we need to know the impact of the change accurately. LocalExecuteInvalidationMessage is called from functions in reorderbuffer (e.g. ReorderBufferImmediateInvalidation, ReorderBufferExecuteInvalidations). This seems to me that changing the condition by the patch reduces the chance of the reorderbuffer's proactive reset of the flag which leads to rebuild the map in the end. Langote-san, could you please explain this perspective ? Best Regards, Takamichi Osumi
On Thu, May 13, 2021 at 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Apr 20, 2021 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund <andres@anarazel.de> wrote:> > This made me take a brief look at pgoutput.c- maybe I am missing > > > > something, but how is the following not a memory leak? > > > > > > > > static void > > > > maybe_send_schema(LogicalDecodingContext *ctx, > > > > ReorderBufferTXN *txn, ReorderBufferChange *change, > > > > Relation relation, RelationSyncEntry *relentry) > > > > { > > > > ... > > > > /* Map must live as long as the session does. */ > > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > > relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), > > > > CreateTupleDescCopy(outdesc)); > > > > MemoryContextSwitchTo(oldctx); > > > > send_relation_and_attrs(ancestor, xid, ctx); > > > > RelationClose(ancestor); > > > > > > > > If - and that's common - convert_tuples_by_name() won't have to do > > > > anything, the copied tuple descs will be permanently leaked. > > > > > > > > > > I also think this is a permanent leak. I think we need to free all the > > > memory associated with this map on the invalidation of this particular > > > relsync entry (basically in rel_sync_cache_relation_cb). > > > > I agree there's a problem here. > > > > Back in: > > > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTPU0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > > > I had proposed to move the map creation from maybe_send_schema() to > > get_rel_sync_entry(), mainly because the latter is where I realized it > > belongs, though a bit too late. > > > > It seems in get_rel_sync_entry, it will only build the map again when > there is any invalidation in publication_rel. Don't we need to build > it after any DDL on the relation itself? I haven't tried this with a > test so I might be missing something. That's a good point, I didn't really think that through. So, rel_sync_cache_relation_cb(), that gets called when the published table's relcache is invalidated, only resets schema_sent but not replicate_valid. The latter, as you said, is reset by rel_sync_cache_publication_cb() when a pg_publication syscache invalidation occurs. So with the patch, it's possible for the map to not be recreated, even when it should, if for example DDL changes the table's TupleDesc. I have put the map-creation code back into maybe_send_schema() in the attached updated patch, updated some comments related to the map, and added a test case that would fail with the previous patch (due to moving map-creation code into get_rel_sync_entry() that is) but succeeds with the updated patch. > Also, don't we need to free the > entire map as suggested by me? Yes, I had missed that too. Addressed in the updated patch. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Takamichi-san, On Fri, May 14, 2021 at 11:19 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Thursday, May 13, 2021 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 20, 2021 at 8:36 AM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > Back in: > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP > > U0L5%2 > > > BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > > > > > I had proposed to move the map creation from maybe_send_schema() to > > > get_rel_sync_entry(), mainly because the latter is where I realized it > > > belongs, though a bit too late. > > > > It seems in get_rel_sync_entry, it will only build the map again when there is > > any invalidation in publication_rel. Don't we need to build it after any DDL on > > the relation itself? I haven't tried this with a test so I might be missing > > something. > Yeah, the patch not only tries to address the memory leak > but also changes the timing (condition) to call convert_tuples_by_name. > This is because the patch placed the function within a condition of !entry->replicate_valid in get_rel_sync_entry. > OTOH, OSS HEAD calls it based on RelationSyncEntry's schema_sent in maybe_send_schema. > > The two flags (replicate_valid and schema_sent) are reset at different timing somehow. > InvalidateSystemCaches resets both flags but schema_send is also reset by LocalExecuteInvalidationMessage > while replicate_valid is reset by CallSyscacheCallbacks. > > IIUC, InvalidateSystemCaches, which applies to both flags, is called > when a transaction starts, via AtStart_Cache and when a table lock is taken via LockRelationOid, etc. > Accordingly, I think we can notice changes after any DDL on the relation. > > But, as for the different timing, we need to know the impact of the change accurately. > LocalExecuteInvalidationMessage is called from functions in reorderbuffer > (e.g. ReorderBufferImmediateInvalidation, ReorderBufferExecuteInvalidations). > This seems to me that changing the condition by the patch > reduces the chance of the reorderbuffer's proactive reset of > the flag which leads to rebuild the map in the end. > > Langote-san, could you please explain this perspective ? Please check the reply I just sent. In summary, moving map-creation into get_rel_sync_entry() was not correct. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Thursday, May 13, 2021 7:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, May 13, 2021 at 11:15 AM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > I tried the following scenarios for trying to reproduce this. > > Scenario2: > > (1) set up 1 publisher and 1 subscriber > > (2) create table with user_catalog_table = true on the pub > > (3) insert some data to this table > > (4) create publication for the table on the pub > > (5) create table with user_catalog_table = true on the sub > > (6) create subscription on the sub > > (7) add synchronous_standby_names to publisher's configuration and > > restart the pub > > (8) have a session to truncate the user_catalog_table on the pub > > > > Scenario 2 was successful. > > Yeah, because pgoutput or for that matter even test_decoding doesn't > acquire a lock on user catalog tables. > > > Are these the scenario you have in mind, if not please let me know for > > the missing steps. > > I would like to reproduce the scenario and write a patch to fix this. > > I don't think we can reproduce it with core plugins as they don't lock user > catalog tables. OK. My current understanding about how the deadlock happens is below. 1. TRUNCATE command is performed on user_catalog_table. 2. TRUNCATE command locks the table and index with ACCESS EXCLUSIVE LOCK. 3. TRUNCATE waits for the subscriber's synchronization when synchronous_standby_names is set. 4. Here, the walsender stops, *if* it tries to acquire a lock on the user_catalog_table because the table where it wants to see is locked by the TRUNCATE already. If this is right, we need to go back to a little bit higher level discussion, since whether we should hack any plugin to simulate the deadlock caused by user_catalog_table reference with locking depends on the assumption if the plugin takes a lock on the user_catalog_table or not. In other words, if the plugin does read only access to that type of table with no lock (by RelationIdGetRelation for example ?), the deadlock concern disappears and we don't need to add anything to plugin sides, IIUC. Here, we haven't gotten any response about whether output plugin takes (should take) the lock on the user_catalog_table. But, I would like to make a consensus about this point before the implementation. By the way, Amit-san already mentioned the main reason of this is that we allow decoding of TRUNCATE operation for user_catalog_table in synchronous mode. The choices are provided by Amit-san already in the past email in [1]. (1) disallow decoding of TRUNCATE operation for user_catalog_tables (2) disallow decoding of any operation for user_catalog_tables like system catalog tables Yet, I'm not sure if either option solves the deadlock concern completely. If application takes an ACCESS EXCLUSIVE lock by LOCK command (not by TRUNCATE !) on the user_catalog_table in a transaction, and if the plugin tries to take a lock on it, I think the deadlock happens. Of course, having a consensus that the plugin takes no lock at all would remove this concern, though. Like this, I'd like to discuss those two items in question together at first. * the plugin should take a lock on user_catalog_table or not * the range of decoding related to user_catalog_table To me, taking no lock on the user_catalog_table from plugin is fine because we have historical snapshot mechanism, which doesn't produce deadlock in any combination even when application executes a LOCK command for ACCESS EXCLUSIVE. In addition, I agree with the idea that we don't decode any operation on user_catalog_table and have better alignment with usual system catalogs. Thoughts ? [1] - https://www.postgresql.org/message-id/CAA4eK1LP8xTysPEMEBYAZ%3D6KoMWfjyf0gzF-9Bp%3DSgVFvYQDVw%40mail.gmail.com Best Regards, Takamichi Osumi
On Fri, May 14, 2021 at 12:44 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, May 13, 2021 at 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Also, don't we need to free the > > entire map as suggested by me? > > Yes, I had missed that too. Addressed in the updated patch. > + relentry->map = convert_tuples_by_name(indesc, outdesc); + if (relentry->map == NULL) + { + /* Map not necessary, so free the TupleDescs too. */ + FreeTupleDesc(indesc); + FreeTupleDesc(outdesc); + } I think the patch frees these descriptors when the map is NULL but not otherwise because free_conversion_map won't free these descriptors. BTW, have you tried this patch in back branches because I think we should backpatch this fix? -- With Regards, Amit Kapila.
On Fri, May 14, 2021 at 2:20 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Thursday, May 13, 2021 7:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I don't think we can reproduce it with core plugins as they don't lock user > > catalog tables. > OK. My current understanding about how the deadlock happens is below. > > 1. TRUNCATE command is performed on user_catalog_table. > 2. TRUNCATE command locks the table and index with ACCESS EXCLUSIVE LOCK. > 3. TRUNCATE waits for the subscriber's synchronization > when synchronous_standby_names is set. > 4. Here, the walsender stops, *if* it tries to acquire a lock on the user_catalog_table > because the table where it wants to see is locked by the TRUNCATE already. > > If this is right, > Yeah, the above steps are correct, so if we take a lock on user_catalog_table when walsender is processing the WAL, it would lead to a problem. > we need to go back to a little bit higher level discussion, > since whether we should hack any plugin to simulate the deadlock caused by user_catalog_table reference > with locking depends on the assumption if the plugin takes a lock on the user_catalog_table or not. > In other words, if the plugin does read only access to that type of table with no lock > (by RelationIdGetRelation for example ?), the deadlock concern disappears and we don't > need to add anything to plugin sides, IIUC. > True, if the plugin doesn't acquire any lock on user_catalog_table, then it is fine but we don't prohibit plugins to acquire locks on user_catalog_tables. This is similar to system catalogs, the plugins and decoding code do acquire lock on those. > Here, we haven't gotten any response about whether output plugin takes (should take) > the lock on the user_catalog_table. But, I would like to make a consensus > about this point before the implementation. > > By the way, Amit-san already mentioned the main reason of this > is that we allow decoding of TRUNCATE operation for user_catalog_table in synchronous mode. > The choices are provided by Amit-san already in the past email in [1]. > (1) disallow decoding of TRUNCATE operation for user_catalog_tables > (2) disallow decoding of any operation for user_catalog_tables like system catalog tables > > Yet, I'm not sure if either option solves the deadlock concern completely. > If application takes an ACCESS EXCLUSIVE lock by LOCK command (not by TRUNCATE !) > on the user_catalog_table in a transaction, and if the plugin tries to take a lock on it, > I think the deadlock happens. Of course, having a consensus that the plugin takes no lock at all > would remove this concern, though. > This is true for system catalogs as well. See the similar report [1] > Like this, I'd like to discuss those two items in question together at first. > * the plugin should take a lock on user_catalog_table or not > * the range of decoding related to user_catalog_table > > To me, taking no lock on the user_catalog_table from plugin is fine > We allow taking locks on system catalogs, so why prohibit user_catalog_tables? However, I agree that if we want plugins to acquire the lock on user_catalog_tables then we should either prohibit decoding of such relations or do something else to avoid deadlock hazards. [1] - https://www.postgresql.org/message-id/CALDaNm1UB%3D%3DgL9Poad4ETjfcyGdJBphWEzEZocodnBd--kJpVw%40mail.gmail.com -- With Regards, Amit Kapila.
On Mon, May 17, 2021 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, May 14, 2021 at 12:44 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, May 13, 2021 at 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Also, don't we need to free the > > > entire map as suggested by me? > > > > Yes, I had missed that too. Addressed in the updated patch. > > > > + relentry->map = convert_tuples_by_name(indesc, outdesc); > + if (relentry->map == NULL) > + { > + /* Map not necessary, so free the TupleDescs too. */ > + FreeTupleDesc(indesc); > + FreeTupleDesc(outdesc); > + } > > I think the patch frees these descriptors when the map is NULL but not > otherwise because free_conversion_map won't free these descriptors. You're right. I have fixed that by making the callback free the TupleDescs explicitly. > BTW, have you tried this patch in back branches because I think we > should backpatch this fix? I have created a version of the patch for v13, the only older release that has this code, and can see that tests, including the newly added one, pass. Both patches are attached. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Monday, May 17, 2021 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, May 17, 2021 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > On Fri, May 14, 2021 at 12:44 PM Amit Langote > <amitlangote09@gmail.com> wrote: > > > On Thu, May 13, 2021 at 7:43 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > > Also, don't we need to free the > > > > entire map as suggested by me? > > > > > > Yes, I had missed that too. Addressed in the updated patch. > > > > > > > + relentry->map = convert_tuples_by_name(indesc, outdesc); > > + if (relentry->map == NULL) > > + { > > + /* Map not necessary, so free the TupleDescs too. */ > > + FreeTupleDesc(indesc); FreeTupleDesc(outdesc); } > > > > I think the patch frees these descriptors when the map is NULL but not > > otherwise because free_conversion_map won't free these descriptors. > > You're right. I have fixed that by making the callback free the TupleDescs > explicitly. This fix looks correct. Also, RT of v3 didn't fail. Further, I've checked the point of view of the newly added tests. As you said that with v1's contents, the test of v2 failed but we are fine with v2 and v3, which proves that we adjust DDL in a right way. > > BTW, have you tried this patch in back branches because I think we > > should backpatch this fix? > > I have created a version of the patch for v13, the only older release that has > this code, and can see that tests, including the newly added one, pass. > > Both patches are attached. The patch for PG13 can be applied to it cleanly and the RT succeeded. I have few really minor comments on your comments in the patch. (1) schema_sent's comment @@ -94,7 +94,8 @@ typedef struct RelationSyncEntry /* * Did we send the schema? If ancestor relid is set, its schema must also - * have been sent for this to be true. + * have been sent and the map to convert the relation's tuples into the + * ancestor's format created before this can be set to be true. */ bool schema_sent; List *streamed_txns; /* streamed toplevel transactions with this I suggest to insert a comma between 'created' and 'before' because the sentence is a bit long and confusing. Or, I thought another comment idea for this part, because the original one doesn't care about the cycle of the reset. "To avoid repetition to send the schema, this is set true after its first transmission. Reset when any change of the relation definition is possible. If ancestor relid is set, its schema must have also been sent while the map to convert the relation's tuples into the ancestor's format created, before this flag is set true." (2) comment in rel_sync_cache_relation_cb() @@ -1190,13 +1208,25 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid) HASH_FIND, NULL); /* - * Reset schema sent status as the relation definition may have changed. + * Reset schema sent status as the relation definition may have changed, + * also freeing any objects that depended on the earlier definition. How about divide this sentence into two sentences like "Reset schema sent status as the relation definition may have changed. Also, free any objects that depended on the earlier definition." Best Regards, Takamichi Osumi
On Mon, May 17, 2021 at 9:45 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Monday, May 17, 2021 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Both patches are attached. > The patch for PG13 can be applied to it cleanly and the RT succeeded. > > I have few really minor comments on your comments in the patch. > > (1) schema_sent's comment > > @@ -94,7 +94,8 @@ typedef struct RelationSyncEntry > > /* > * Did we send the schema? If ancestor relid is set, its schema must also > - * have been sent for this to be true. > + * have been sent and the map to convert the relation's tuples into the > + * ancestor's format created before this can be set to be true. > */ > bool schema_sent; > List *streamed_txns; /* streamed toplevel transactions with this > > > I suggest to insert a comma between 'created' and 'before' > because the sentence is a bit long and confusing. > > Or, I thought another comment idea for this part, > because the original one doesn't care about the cycle of the reset. > > "To avoid repetition to send the schema, this is set true after its first transmission. > Reset when any change of the relation definition is possible. If ancestor relid is set, > its schema must have also been sent while the map to convert the relation's tuples into > the ancestor's format created, before this flag is set true." > > (2) comment in rel_sync_cache_relation_cb() > > @@ -1190,13 +1208,25 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid) > HASH_FIND, NULL); > > /* > - * Reset schema sent status as the relation definition may have changed. > + * Reset schema sent status as the relation definition may have changed, > + * also freeing any objects that depended on the earlier definition. > > How about divide this sentence into two sentences like > "Reset schema sent status as the relation definition may have changed. > Also, free any objects that depended on the earlier definition." Thanks for reading it over. I have revised comments in a way that hopefully addresses your concerns. While doing so, it occurred to me (maybe not for the first time) that we are *unnecessarily* doing send_relation_and_attrs() for a relation if the changes will be published using an ancestor's schema. In that case, sending only the ancestor's schema suffices AFAICS. Changing the code that way doesn't break any tests. I propose that we fix that too. Updated patches attached. I've added a commit message to both patches. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, May 14, 2021 at 2:20 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > On Thursday, May 13, 2021 7:21 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > I don't think we can reproduce it with core plugins as they don't > > > lock user catalog tables. > > OK. My current understanding about how the deadlock happens is below. > > > > 1. TRUNCATE command is performed on user_catalog_table. > > 2. TRUNCATE command locks the table and index with ACCESS > EXCLUSIVE LOCK. > > 3. TRUNCATE waits for the subscriber's synchronization > > when synchronous_standby_names is set. > > 4. Here, the walsender stops, *if* it tries to acquire a lock on the > user_catalog_table > > because the table where it wants to see is locked by the > TRUNCATE already. > > > > If this is right, > > > > Yeah, the above steps are correct, so if we take a lock on user_catalog_table > when walsender is processing the WAL, it would lead to a problem. > > > we need to go back to a little bit higher level discussion, since > > whether we should hack any plugin to simulate the deadlock caused by > > user_catalog_table reference with locking depends on the assumption if > the plugin takes a lock on the user_catalog_table or not. > > In other words, if the plugin does read only access to that type of > > table with no lock (by RelationIdGetRelation for example ?), the > > deadlock concern disappears and we don't need to add anything to plugin > sides, IIUC. > > > > True, if the plugin doesn't acquire any lock on user_catalog_table, then it is > fine but we don't prohibit plugins to acquire locks on user_catalog_tables. > This is similar to system catalogs, the plugins and decoding code do acquire > lock on those. Thanks for sharing this. I'll take the idea that plugin can take a lock on user_catalog_table into account. > > Here, we haven't gotten any response about whether output plugin takes > > (should take) the lock on the user_catalog_table. But, I would like to > > make a consensus about this point before the implementation. > > > > By the way, Amit-san already mentioned the main reason of this is that > > we allow decoding of TRUNCATE operation for user_catalog_table in > synchronous mode. > > The choices are provided by Amit-san already in the past email in [1]. > > (1) disallow decoding of TRUNCATE operation for user_catalog_tables > > (2) disallow decoding of any operation for user_catalog_tables like > > system catalog tables > > > > Yet, I'm not sure if either option solves the deadlock concern completely. > > If application takes an ACCESS EXCLUSIVE lock by LOCK command (not > by > > TRUNCATE !) on the user_catalog_table in a transaction, and if the > > plugin tries to take a lock on it, I think the deadlock happens. Of > > course, having a consensus that the plugin takes no lock at all would > remove this concern, though. > > > > This is true for system catalogs as well. See the similar report [1] > > > Like this, I'd like to discuss those two items in question together at first. > > * the plugin should take a lock on user_catalog_table or not > > * the range of decoding related to user_catalog_table > > > > To me, taking no lock on the user_catalog_table from plugin is fine > > > > We allow taking locks on system catalogs, so why prohibit > user_catalog_tables? However, I agree that if we want plugins to acquire the > lock on user_catalog_tables then we should either prohibit decoding of such > relations or do something else to avoid deadlock hazards. OK. Although we have not concluded the range of logical decoding of user_catalog_table (like we should exclude TRUNCATE command only or all operations on that type of table), I'm worried that disallowing the logical decoding of user_catalog_table produces the deadlock still. It's because disabling it by itself does not affect the lock taken by TRUNCATE command. What I have in mind is an example below. (1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table. (2) logical replication is set up in synchronous mode. (3) TRUNCATE command takes an access exclusive lock on the user_catalog_table. (4) This time, we don't do anything for the TRUNCATE decoding. (5) the plugin tries to take a lock on the truncated table but, it can't due to the lock by TRUNCATE command. I was not sure that the place where the plugin takes the lock is in truncate_cb or somewhere else not directly related to decoding of the user_catalog_table itself, so I might be wrong. However, in this case, the solution would be not disabling the decoding of user_catalog_table but prohibiting TRUNCATE command on user_catalog_table in synchronous_mode. If this is true, I need to extend an output plugin and simulate the deadlock first and remove it by fixing the TRUNCATE side. Thoughts ? Best Regards, Takamichi Osumi
On Tue, May 18, 2021 at 1:29 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > We allow taking locks on system catalogs, so why prohibit > > user_catalog_tables? However, I agree that if we want plugins to acquire the > > lock on user_catalog_tables then we should either prohibit decoding of such > > relations or do something else to avoid deadlock hazards. > OK. > > Although we have not concluded the range of logical decoding of user_catalog_table > (like we should exclude TRUNCATE command only or all operations on that type of table), > I'm worried that disallowing the logical decoding of user_catalog_table produces > the deadlock still. It's because disabling it by itself does not affect the > lock taken by TRUNCATE command. What I have in mind is an example below. > > (1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table. > (2) logical replication is set up in synchronous mode. > (3) TRUNCATE command takes an access exclusive lock on the user_catalog_table. > (4) This time, we don't do anything for the TRUNCATE decoding. > (5) the plugin tries to take a lock on the truncated table > but, it can't due to the lock by TRUNCATE command. > If you skip decoding of truncate then we won't invoke plugin API so step 5 will be skipped. > I was not sure that the place where the plugin takes the lock is in truncate_cb > or somewhere else not directly related to decoding of the user_catalog_table itself, > so I might be wrong. However, in this case, > the solution would be not disabling the decoding of user_catalog_table > but prohibiting TRUNCATE command on user_catalog_table in synchronous_mode. > If this is true, I need to extend an output plugin and simulate the deadlock first > and remove it by fixing the TRUNCATE side. Thoughts ? > I suggest not spending too much time reproducing this because it is quite clear that it will lead to deadlock if the plugin acquires lock on user_catalog_table and we allow decoding of truncate. But if you want to see how that happens you can try as well. -- With Regards, Amit Kapila.
On Tue, May 18, 2021 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, May 18, 2021 at 1:29 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > We allow taking locks on system catalogs, so why prohibit > > > user_catalog_tables? However, I agree that if we want plugins to acquire the > > > lock on user_catalog_tables then we should either prohibit decoding of such > > > relations or do something else to avoid deadlock hazards. > > OK. > > > > Although we have not concluded the range of logical decoding of user_catalog_table > > (like we should exclude TRUNCATE command only or all operations on that type of table), > > I'm worried that disallowing the logical decoding of user_catalog_table produces > > the deadlock still. It's because disabling it by itself does not affect the > > lock taken by TRUNCATE command. What I have in mind is an example below. > > > > (1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table. > > (2) logical replication is set up in synchronous mode. > > (3) TRUNCATE command takes an access exclusive lock on the user_catalog_table. > > (4) This time, we don't do anything for the TRUNCATE decoding. > > (5) the plugin tries to take a lock on the truncated table > > but, it can't due to the lock by TRUNCATE command. > > > > If you skip decoding of truncate then we won't invoke plugin API so > step 5 will be skipped. > I think you were right here even if skip step-4, the plugin might take a lock on user_catalog_table for something else. I am not sure but I think we should prohibit truncate on user_catalog_tables as we prohibit truncate on system catalog tables (see below [1]) if we want plugin to lock them, otherwise, as you said it might lead to deadlock. For the matter, I think we should once check all other operations where we can take an exclusive lock on [user]_catalog_table, say Cluster command, and compare the behavior of same on system catalog tables. [1] postgres=# truncate pg_class; ERROR: permission denied: "pg_class" is a system catalog postgres=# cluster pg_class; ERROR: there is no previously clustered index for table "pg_class" -- With Regards, Amit Kapila.
On Wed, May 19, 2021 at 7:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, May 18, 2021 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, May 18, 2021 at 1:29 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > We allow taking locks on system catalogs, so why prohibit > > > > user_catalog_tables? However, I agree that if we want plugins to acquire the > > > > lock on user_catalog_tables then we should either prohibit decoding of such > > > > relations or do something else to avoid deadlock hazards. > > > OK. > > > > > > Although we have not concluded the range of logical decoding of user_catalog_table > > > (like we should exclude TRUNCATE command only or all operations on that type of table), > > > I'm worried that disallowing the logical decoding of user_catalog_table produces > > > the deadlock still. It's because disabling it by itself does not affect the > > > lock taken by TRUNCATE command. What I have in mind is an example below. > > > > > > (1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table. > > > (2) logical replication is set up in synchronous mode. > > > (3) TRUNCATE command takes an access exclusive lock on the user_catalog_table. > > > (4) This time, we don't do anything for the TRUNCATE decoding. > > > (5) the plugin tries to take a lock on the truncated table > > > but, it can't due to the lock by TRUNCATE command. > > > > > > > If you skip decoding of truncate then we won't invoke plugin API so > > step 5 will be skipped. > > > > I think you were right here even if skip step-4, the plugin might take > a lock on user_catalog_table for something else. I am not sure but I > think we should prohibit truncate on user_catalog_tables as we > prohibit truncate on system catalog tables (see below [1]) if we want > plugin to lock them, otherwise, as you said it might lead to deadlock. > For the matter, I think we should once check all other operations > where we can take an exclusive lock on [user]_catalog_table, say > Cluster command, and compare the behavior of same on system catalog > tables. > > [1] > postgres=# truncate pg_class; > ERROR: permission denied: "pg_class" is a system catalog > postgres=# cluster pg_class; > ERROR: there is no previously clustered index for table "pg_class" > Please ignore the cluster command as we need to use 'using index' with that command to make it successful. I just want to show the truncate command behavior for which you have asked the question. -- With Regards, Amit Kapila.
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Wednesday, May 19, 2021 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, May 19, 2021 at 7:59 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Tue, May 18, 2021 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > On Tue, May 18, 2021 at 1:29 PM osumi.takamichi@fujitsu.com > > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > > > On Monday, May 17, 2021 6:45 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > > > > > > > > We allow taking locks on system catalogs, so why prohibit > > > > > user_catalog_tables? However, I agree that if we want plugins to > > > > > acquire the lock on user_catalog_tables then we should either > > > > > prohibit decoding of such relations or do something else to avoid > deadlock hazards. > > > > OK. > > > > > > > > Although we have not concluded the range of logical decoding of > > > > user_catalog_table (like we should exclude TRUNCATE command only > > > > or all operations on that type of table), I'm worried that > > > > disallowing the logical decoding of user_catalog_table produces > > > > the deadlock still. It's because disabling it by itself does not affect the > lock taken by TRUNCATE command. What I have in mind is an example > below. > > > > > > > > (1) plugin (e.g. pgoutput) is designed to take a lock on > user_catalog_table. > > > > (2) logical replication is set up in synchronous mode. > > > > (3) TRUNCATE command takes an access exclusive lock on the > user_catalog_table. > > > > (4) This time, we don't do anything for the TRUNCATE decoding. > > > > (5) the plugin tries to take a lock on the truncated table > > > > but, it can't due to the lock by TRUNCATE command. > > > > > > > > > > If you skip decoding of truncate then we won't invoke plugin API so > > > step 5 will be skipped. > > > > > > > I think you were right here even if skip step-4, the plugin might take > > a lock on user_catalog_table for something else. Yes, we can't know the exact place where the user wants to use the feature of user_catalog_table. Even if we imagine that the user skips the truncate decoding (I imagined continuing and skipping a case in REORDER_BUFFER_CHANGE_TRUNCATE of pgoutput), it's possible that the user accesses it somewhere else for different purpose with lock. > I am not sure but I > > think we should prohibit truncate on user_catalog_tables as we > > prohibit truncate on system catalog tables (see below [1]) if we want > > plugin to lock them, otherwise, as you said it might lead to deadlock. > > For the matter, I think we should once check all other operations > > where we can take an exclusive lock on [user]_catalog_table, say > > Cluster command, and compare the behavior of same on system catalog > > tables. > > > > [1] > > postgres=# truncate pg_class; > > ERROR: permission denied: "pg_class" is a system catalog postgres=# > > cluster pg_class; > > ERROR: there is no previously clustered index for table "pg_class" > > > > Please ignore the cluster command as we need to use 'using index' with that > command to make it successful. I just want to show the truncate command > behavior for which you have asked the question. Thank you so much for clarifying the direction. I agree with the changing the TRUNCATE side. I'll make a patch based on this. Best Regards, Takamichi Osumi
On Wed, May 19, 2021 at 8:28 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Wednesday, May 19, 2021 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, May 19, 2021 at 7:59 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > On Tue, May 18, 2021 at 5:29 PM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > > > On Tue, May 18, 2021 at 1:29 PM osumi.takamichi@fujitsu.com > > > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > > > > > On Monday, May 17, 2021 6:45 PM Amit Kapila > > <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > We allow taking locks on system catalogs, so why prohibit > > > > > > user_catalog_tables? However, I agree that if we want plugins to > > > > > > acquire the lock on user_catalog_tables then we should either > > > > > > prohibit decoding of such relations or do something else to avoid > > deadlock hazards. > > > > > OK. > > > > > > > > > > Although we have not concluded the range of logical decoding of > > > > > user_catalog_table (like we should exclude TRUNCATE command only > > > > > or all operations on that type of table), I'm worried that > > > > > disallowing the logical decoding of user_catalog_table produces > > > > > the deadlock still. It's because disabling it by itself does not affect the > > lock taken by TRUNCATE command. What I have in mind is an example > > below. > > > > > > > > > > (1) plugin (e.g. pgoutput) is designed to take a lock on > > user_catalog_table. > > > > > (2) logical replication is set up in synchronous mode. > > > > > (3) TRUNCATE command takes an access exclusive lock on the > > user_catalog_table. > > > > > (4) This time, we don't do anything for the TRUNCATE decoding. > > > > > (5) the plugin tries to take a lock on the truncated table > > > > > but, it can't due to the lock by TRUNCATE command. > > > > > > > > > > > > > If you skip decoding of truncate then we won't invoke plugin API so > > > > step 5 will be skipped. > > > > > > > > > > I think you were right here even if skip step-4, the plugin might take > > > a lock on user_catalog_table for something else. > Yes, we can't know the exact place where the user wants to use the feature > of user_catalog_table. Even if we imagine that the user skips > the truncate decoding (I imagined continuing and skipping a case in > REORDER_BUFFER_CHANGE_TRUNCATE of pgoutput), > it's possible that the user accesses it somewhere else for different purpose with lock. > > > > I am not sure but I > > > think we should prohibit truncate on user_catalog_tables as we > > > prohibit truncate on system catalog tables (see below [1]) if we want > > > plugin to lock them, otherwise, as you said it might lead to deadlock. > > > For the matter, I think we should once check all other operations > > > where we can take an exclusive lock on [user]_catalog_table, say > > > Cluster command, and compare the behavior of same on system catalog > > > tables. > > > > > > [1] > > > postgres=# truncate pg_class; > > > ERROR: permission denied: "pg_class" is a system catalog postgres=# > > > cluster pg_class; > > > ERROR: there is no previously clustered index for table "pg_class" > > > > > > > Please ignore the cluster command as we need to use 'using index' with that > > command to make it successful. I just want to show the truncate command > > behavior for which you have asked the question. > Thank you so much for clarifying the direction. > I agree with the changing the TRUNCATE side. > I'll make a patch based on this. > Isn't it a better idea to start a new thread where you can summarize whatever we have discussed here about user_catalog_tables? We might get the opinion from others about the behavior change you are proposing. -- With Regards, Amit Kapila.
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Wednesday, May 19, 2021 1:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I am not sure but I > > > > think we should prohibit truncate on user_catalog_tables as we > > > > prohibit truncate on system catalog tables (see below [1]) if we > > > > want plugin to lock them, otherwise, as you said it might lead to > deadlock. > > > > For the matter, I think we should once check all other operations > > > > where we can take an exclusive lock on [user]_catalog_table, say > > > > Cluster command, and compare the behavior of same on system > > > > catalog tables. > > > > > > > > [1] > > > > postgres=# truncate pg_class; > > > > ERROR: permission denied: "pg_class" is a system catalog > > > > postgres=# cluster pg_class; > > > > ERROR: there is no previously clustered index for table "pg_class" > > > > > > > > > > Please ignore the cluster command as we need to use 'using index' > > > with that command to make it successful. I just want to show the > > > truncate command behavior for which you have asked the question. > > Thank you so much for clarifying the direction. > > I agree with the changing the TRUNCATE side. > > I'll make a patch based on this. > > > > Isn't it a better idea to start a new thread where you can summarize whatever > we have discussed here about user_catalog_tables? We might get the opinion > from others about the behavior change you are proposing. You are right. So, I've launched it with the patch for this. Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Tuesday, May 18, 2021 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > While doing so, it occurred to me (maybe not for the first time) that we are > *unnecessarily* doing send_relation_and_attrs() for a relation if the changes > will be published using an ancestor's schema. In that case, sending only the > ancestor's schema suffices AFAICS. Changing the code that way doesn't > break any tests. I propose that we fix that too. I've analyzed this new change's validity. My conclusion for this is that we don't have any bad impact from this, which means your additional fix is acceptable. I think this addition blurs the purpose of the patch a bit, though. With the removal of the send_relation_and_attrs() of the patch, we don't send one pair of LOGICAL_REP_MSG_TYPE('Y'), LOGICAL_REP_MSG_RELATION('R') message to the subscriber when we use ancestor. Therefore, we become not to register or update type and relation for maybe_send_schema()'s argument 'relation' with the patch, in the case to use ancestor's schema. However, both the pgoutput_change() and pgoutput_truncate() have conditions to check oids to send to the subscriber for any operations. Accordingly, the pair information for that argument 'relation' aren't used on the subscriber in that case and we are fine. I'll comment on other minor things in another email. Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Tuesday, May 18, 2021 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, May 17, 2021 at 9:45 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > On Monday, May 17, 2021 6:52 PM Amit Langote > <amitlangote09@gmail.com> wrote: > > > Both patches are attached. > > The patch for PG13 can be applied to it cleanly and the RT succeeded. > > > > I have few really minor comments on your comments in the patch. > > > > (1) schema_sent's comment > > > > @@ -94,7 +94,8 @@ typedef struct RelationSyncEntry > > > > /* > > * Did we send the schema? If ancestor relid is set, its schema > must also > > - * have been sent for this to be true. > > + * have been sent and the map to convert the relation's tuples into > the > > + * ancestor's format created before this can be set to be true. > > */ > > bool schema_sent; > > List *streamed_txns; /* streamed toplevel > transactions with this > > > > > > I suggest to insert a comma between 'created' and 'before' > > because the sentence is a bit long and confusing. > > > > Or, I thought another comment idea for this part, because the original > > one doesn't care about the cycle of the reset. > > > > "To avoid repetition to send the schema, this is set true after its first > transmission. > > Reset when any change of the relation definition is possible. If > > ancestor relid is set, its schema must have also been sent while the > > map to convert the relation's tuples into the ancestor's format created, > before this flag is set true." > > > > (2) comment in rel_sync_cache_relation_cb() > > > > @@ -1190,13 +1208,25 @@ rel_sync_cache_relation_cb(Datum arg, Oid > relid) > > > > HASH_FIND, NULL); > > > > /* > > - * Reset schema sent status as the relation definition may have > changed. > > + * Reset schema sent status as the relation definition may have > changed, > > + * also freeing any objects that depended on the earlier definition. > > > > How about divide this sentence into two sentences like "Reset schema > > sent status as the relation definition may have changed. > > Also, free any objects that depended on the earlier definition." > > Thanks for reading it over. I have revised comments in a way that hopefully > addresses your concerns. Thank you for your fix. I think the patches look good to me. Just in case, I'll report that the two patches succeeded in the RT as expected and from my side, there's no more suggestions. Those are ready for committer, I think. Best Regards, Takamichi Osumi
On Thu, May 20, 2021 at 5:59 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Tuesday, May 18, 2021 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > While doing so, it occurred to me (maybe not for the first time) that we are > > *unnecessarily* doing send_relation_and_attrs() for a relation if the changes > > will be published using an ancestor's schema. In that case, sending only the > > ancestor's schema suffices AFAICS. Changing the code that way doesn't > > break any tests. I propose that we fix that too. > I've analyzed this new change's validity. > My conclusion for this is that we don't have > any bad impact from this, which means your additional fix is acceptable. > I think this addition blurs the purpose of the patch a bit, though. Okay, I've extracted that change into 0002. > With the removal of the send_relation_and_attrs() of the patch, > we don't send one pair of LOGICAL_REP_MSG_TYPE('Y'), > LOGICAL_REP_MSG_RELATION('R') message to the subscriber > when we use ancestor. Therefore, we become > not to register or update type and relation for maybe_send_schema()'s > argument 'relation' with the patch, in the case to use ancestor's schema. > However, both the pgoutput_change() and pgoutput_truncate() > have conditions to check oids to send to the subscriber for any operations. > Accordingly, the pair information for that argument 'relation' > aren't used on the subscriber in that case and we are fine. Thanks for checking that. Here are updated/divided patches. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Thursday, May 20, 2021 9:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > Here are updated/divided patches. Thanks for your updates. But, I've detected segmentation faults caused by the patch, which can happen during 100_bugs.pl in src/test/subscription. This happens more than one in ten times. This problem would be a timing issue and has been introduced by v3 already. I used v5 for HEAD also and reproduced this failure, while OSS HEAD doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a tight loop. I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from v3. * The message of the failure during TAP test. # Postmaster PID for node "twoways" is 5015 Waiting for replication conn testsub's replay_lsn to pass pg_current_wal_lsn() on twoways # poll_query_until timed out executing this query: # SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name= 'testsub'; # expecting this output: # t # last actual query output: # # with stderr: # psql: error: connection to server on socket "/tmp/cs8dhFOtZZ/.s.PGSQL.59345" failed: No such file or directory # Is the server running locally and accepting connections on that socket? timed out waiting for catchup at t/100_bugs.pl line 148. The failure produces core file and its back trace is below. My first guess of the cause is that between the timing to get an entry from hash_search() in get_rel_sync_entry() and to set the map by convert_tuples_by_name() in maybe_send_schema(), we had invalidation message, which tries to free unset descs in the entry ? * core file backtrace Core was generated by `postgres: twoways: walsender k5user [local] START_REPLICATION '. Program terminated with signal 11, Segmentation fault. #0 0x00007f93b38b8c2b in rel_sync_cache_relation_cb (arg=0, relid=16388) at pgoutput.c:1225 1225 FreeTupleDesc(entry->map->indesc); Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-44.el7.x86_64 libicu-50.2-4.el7_7.x86_64 libstdc++-4.8.5-44.el7.x86_64 (gdb) bt #0 0x00007f93b38b8c2b in rel_sync_cache_relation_cb (arg=0, relid=16388) at pgoutput.c:1225 #1 0x0000000000ae21f0 in LocalExecuteInvalidationMessage (msg=0x21d1de8) at inval.c:601 #2 0x00000000008dbd6e in ReorderBufferExecuteInvalidations (nmsgs=4, msgs=0x21d1db8) at reorderbuffer.c:3232 #3 0x00000000008da70a in ReorderBufferProcessTXN (rb=0x21d1a40, txn=0x2210b58, commit_lsn=25569096, snapshot_now=0x21d1e10,command_id=1, streaming=false) at reorderbuffer.c:2294 #4 0x00000000008dae56 in ReorderBufferReplay (txn=0x2210b58, rb=0x21d1a40, xid=748, commit_lsn=25569096, end_lsn=25569216,commit_time=674891531661619, origin_id=0, origin_lsn=0) at reorderbuffer.c:2591 #5 0x00000000008daede in ReorderBufferCommit (rb=0x21d1a40, xid=748, commit_lsn=25569096, end_lsn=25569216, commit_time=674891531661619,origin_id=0, origin_lsn=0) at reorderbuffer.c:2615 #6 0x00000000008cae06 in DecodeCommit (ctx=0x21e6880, buf=0x7fffb9383db0, parsed=0x7fffb9383c10, xid=748, two_phase=false)at decode.c:744 #7 0x00000000008ca1ed in DecodeXactOp (ctx=0x21e6880, buf=0x7fffb9383db0) at decode.c:278 #8 0x00000000008c9e76 in LogicalDecodingProcessRecord (ctx=0x21e6880, record=0x21e6c80) at decode.c:142 #9 0x0000000000901fcc in XLogSendLogical () at walsender.c:2876 #10 0x0000000000901327 in WalSndLoop (send_data=0x901f30 <XLogSendLogical>) at walsender.c:2306 #11 0x00000000008ffd5f in StartLogicalReplication (cmd=0x219aff8) at walsender.c:1206 #12 0x00000000009006ae in exec_replication_command ( cmd_string=0x2123c20 "START_REPLICATION SLOT \"pg_16400_sync_16392_6964617299612181363\" LOGICAL 0/182D058 (proto_version'2', publication_names '\"testpub\"')") at walsender.c:1646 #13 0x000000000096ffae in PostgresMain (argc=1, argv=0x7fffb93840d0, dbname=0x214ef18 "d1", username=0x214eef8 "k5user")at postgres.c:4482 I'll update when I get more information. Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Friday, May 21, 2021 3:55 PM I wrote: > On Thursday, May 20, 2021 9:59 PM Amit Langote > <amitlangote09@gmail.com> wrote: > > Here are updated/divided patches. > Thanks for your updates. > > But, I've detected segmentation faults caused by the patch, which can > happen during 100_bugs.pl in src/test/subscription. > This happens more than one in ten times. > > This problem would be a timing issue and has been introduced by v3 already. > I used v5 for HEAD also and reproduced this failure, while OSS HEAD doesn't > reproduce this, even when I executed 100_bugs.pl 200 times in a tight loop. > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from v3. > > * The message of the failure during TAP test. > > # Postmaster PID for node "twoways" is 5015 Waiting for replication conn > testsub's replay_lsn to pass pg_current_wal_lsn() on twoways # > poll_query_until timed out executing this query: > # SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' > FROM pg_catalog.pg_stat_replication WHERE application_name = 'testsub'; > # expecting this output: > # t > # last actual query output: > # > # with stderr: > # psql: error: connection to server on socket > "/tmp/cs8dhFOtZZ/.s.PGSQL.59345" failed: No such file or directory > # Is the server running locally and accepting connections on that > socket? > timed out waiting for catchup at t/100_bugs.pl line 148. > > > The failure produces core file and its back trace is below. > My first guess of the cause is that between the timing to get an entry from > hash_search() in get_rel_sync_entry() and to set the map by > convert_tuples_by_name() in maybe_send_schema(), we had invalidation > message, which tries to free unset descs in the entry ? Sorry, this guess was not accurate at all. Please ignore this because we need to have the entry->map set to free descs. Sorry for making noises. Best Regards, Takamichi Osumi
On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Thursday, May 20, 2021 9:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Here are updated/divided patches. > Thanks for your updates. > > But, I've detected segmentation faults caused by the patch, > which can happen during 100_bugs.pl in src/test/subscription. > This happens more than one in ten times. > > This problem would be a timing issue and has been introduced by v3 already. > I used v5 for HEAD also and reproduced this failure, while > OSS HEAD doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a tight loop. > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from v3. > > My first guess of the cause is that between the timing to get an entry from hash_search() in get_rel_sync_entry() > and to set the map by convert_tuples_by_name() in maybe_send_schema(), we had invalidation message, > which tries to free unset descs in the entry ? Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when first initializing an entry. It's possible that without doing so, the map remains set to a garbage value, which causes the invalidation callback that runs into such partially initialized entry to segfault upon trying to deference that garbage pointer. I've tried that in the attached v6 patches. Please check. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > But, I've detected segmentation faults caused by the patch, which can > > happen during 100_bugs.pl in src/test/subscription. > > This happens more than one in ten times. > > > > This problem would be a timing issue and has been introduced by v3 > already. > > I used v5 for HEAD also and reproduced this failure, while OSS HEAD > > doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a > tight loop. > > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from > v3. > > > > My first guess of the cause is that between the timing to get an entry > > from hash_search() in get_rel_sync_entry() and to set the map by > > convert_tuples_by_name() in maybe_send_schema(), we had invalidation > message, which tries to free unset descs in the entry ? > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when first > initializing an entry. It's possible that without doing so, the map remains set > to a garbage value, which causes the invalidation callback that runs into such > partially initialized entry to segfault upon trying to deference that garbage > pointer. Just in case, I prepared a new PG and did a check to make get_rel_sync_entry() print its first pointer value with elog. Here, when I executed 100_bugs.pl, I got some garbage below. * The change I did: @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; entry->publish_as_relid = InvalidOid; + elog(LOG, "**> the pointer's default value : %p", entry->map); } * Grep result of all logs from 100_bugs.pl 2021-05-21 09:05:56.132 UTC [29122] sub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.556 UTC [30198] testsub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.561 UTC [30200] pg_16389_sync_16384_6964667281140237667 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:11.567 UTC [30191] testsub2 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.570 UTC [30194] pg_16387_sync_16384_6964667292923737489 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:02.513 UTC [29809] testsub LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:02.557 UTC [29809] testsub LOG: **> the pointer's default value : (nil) So, your solution is right, I think. > I've tried that in the attached v6 patches. Please check. With this fix, I don't get the failure. I executed 100_bugs.pl 100 times in a loop and didn't face that problem. Again, I conducted one make check-world for each combination * use OSS HEAD or PG13 * apply only the first patch or both two patches Those all passed. Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Friday, May 21, 2021 9:45 PM I worte: > On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09@gmail.com> > wrote: > > On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > But, I've detected segmentation faults caused by the patch, which > > > can happen during 100_bugs.pl in src/test/subscription. > > > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when > > first initializing an entry. It's possible that without doing so, the > > map remains set to a garbage value, which causes the invalidation > > callback that runs into such partially initialized entry to segfault > > upon trying to deference that garbage pointer. > Just in case, I prepared a new PG and > did a check to make get_rel_sync_entry() print its first pointer value with elog. > Here, when I executed 100_bugs.pl, I got some garbage below. > > * The change I did: > @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) > entry->pubactions.pubinsert = > entry->pubactions.pubupdate = > entry->pubactions.pubdelete = > entry->pubactions.pubtruncate = false; > entry->publish_as_relid = InvalidOid; > + elog(LOG, "**> the pointer's default value : %p", > + entry->map); > } > (snip) > > So, your solution is right, I think. This was a bit indirect. I've checked the core file of v3's failure core and printed the entry to get more confidence. Sorry for inappropriate measure to verify the solution. $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0, replicate_valid = false, pubactions = {pubinsert = false,pubupdate = false, pubdelete = false, pubtruncate = false}, publish_as_relid = 16388, map = 0x7f7f7f7f7f7f7f7f} Yes, the process tried to free garbage. Now, we are convinced that we have addressed the problem. That's it ! Best Regards, Takamichi Osumi
On Sat, May 22, 2021 at 11:00 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Friday, May 21, 2021 9:45 PM I worte: > > On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com > > > <osumi.takamichi@fujitsu.com> wrote: > > > > But, I've detected segmentation faults caused by the patch, which > > > > can happen during 100_bugs.pl in src/test/subscription. > > > > > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when > > > first initializing an entry. It's possible that without doing so, the > > > map remains set to a garbage value, which causes the invalidation > > > callback that runs into such partially initialized entry to segfault > > > upon trying to deference that garbage pointer. > > Just in case, I prepared a new PG and > > did a check to make get_rel_sync_entry() print its first pointer value with elog. > > Here, when I executed 100_bugs.pl, I got some garbage below. > > > > * The change I did: > > @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) > > entry->pubactions.pubinsert = > > entry->pubactions.pubupdate = > > entry->pubactions.pubdelete = > > entry->pubactions.pubtruncate = false; > > entry->publish_as_relid = InvalidOid; > > + elog(LOG, "**> the pointer's default value : %p", > > + entry->map); > > } > > > (snip) > > > > So, your solution is right, I think. > This was a bit indirect. > I've checked the core file of v3's failure core and printed the entry > to get more confidence. Sorry for inappropriate measure to verify the solution. > > $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0, replicate_valid = false, pubactions = {pubinsert = false,pubupdate = false, pubdelete = false, pubtruncate = false}, publish_as_relid = 16388, > map = 0x7f7f7f7f7f7f7f7f} > > Yes, the process tried to free garbage. > Now, we are convinced that we have addressed the problem. That's it ! Thanks for confirming that. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Saturday, May 22, 2021 11:58 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, May 22, 2021 at 11:00 AM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > I've checked the core file of v3's failure core and printed the entry > > to get more confidence. Sorry for inappropriate measure to verify the > solution. > > > > $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0, > replicate_valid = false, pubactions = {pubinsert = false, pubupdate = false, > pubdelete = false, pubtruncate = false}, publish_as_relid = 16388, > > map = 0x7f7f7f7f7f7f7f7f} > > > > Yes, the process tried to free garbage. > > Now, we are convinced that we have addressed the problem. That's it ! > > Thanks for confirming that. Langote-san, I need to report another issue. When I execute make check-world with v6 additionally, I've gotten another failure. I get this about once in 20 times of make check-world with v6. The test ended with stderr outputs below. NOTICE: database "regression" does not exist, skipping make[2]: *** [check] Error 1 make[1]: *** [check-isolation-recurse] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [check-world-src/test-recurse] Error 2 make: *** Waiting for unfinished jobs.... And, I had ./src/test/isolation/output_iso/regression.diffs and regression.out, which told me below. test detach-partition-concurrently-1 ... ok 705 ms test detach-partition-concurrently-2 ... ok 260 ms test detach-partition-concurrently-3 ... FAILED 618 ms test detach-partition-concurrently-4 ... ok 1384 ms The diffs file showed me below. diff -U3 /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out /home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out --- /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out 2021-05-24 01:22:22.381488295+0000 +++ /home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out 2021-05-2402:47:08.292488295 +0000 @@ -190,7 +190,7 @@ t step s2detach: <... completed> -error in steps s1cancel s2detach: ERROR: canceling statement due to user request +ERROR: canceling statement due to user request step s2detach2: ALTER TABLE d3_listp DETACH PARTITION d3_listp2 CONCURRENTLY; ERROR: partition "d3_listp1" already pending detach in partitioned table "public.d3_listp" step s1c: COMMIT; I'm not sure if this is related to the patch or we already have this from OSS HEAD yet. FYI: the steps I did are 1 - clone PG(I used f5024d8) 2 - git am the 2 patches for HEAD * HEAD-v6-0001-pgoutput-fix-memory-management-of-RelationSyncEnt.patch * HEAD-v6-0002-pgoutput-don-t-send-leaf-partition-schema-when-pu.patch 3 - configure with --enable-cassert --enable-debug --enable-tap-tests --with-icu CFLAGS=-O0 --prefix=/where/you/wanna/put/PG 4 - make -j2 2> make.log # did not get stderr output. 5 - make check-world -j8 2> make_check_world.log (after this I've conducted another tight loop test by repeating make check-world and got the error) Best Regards, Takamichi Osumi
On Mon, May 24, 2021 at 12:16 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > On Saturday, May 22, 2021 11:58 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, May 22, 2021 at 11:00 AM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > I've checked the core file of v3's failure core and printed the entry > > > to get more confidence. Sorry for inappropriate measure to verify the > > solution. > > > > > > $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0, > > replicate_valid = false, pubactions = {pubinsert = false, pubupdate = false, > > pubdelete = false, pubtruncate = false}, publish_as_relid = 16388, > > > map = 0x7f7f7f7f7f7f7f7f} > > > > > > Yes, the process tried to free garbage. > > > Now, we are convinced that we have addressed the problem. That's it ! > > > > Thanks for confirming that. > Langote-san, I need to report another issue. Thanks for continued testing. > When I execute make check-world with v6 additionally, > I've gotten another failure. I get this about once in > 20 times of make check-world with v6. > > The test ended with stderr outputs below. > > NOTICE: database "regression" does not exist, skipping > make[2]: *** [check] Error 1 > make[1]: *** [check-isolation-recurse] Error 2 > make[1]: *** Waiting for unfinished jobs.... > make: *** [check-world-src/test-recurse] Error 2 > make: *** Waiting for unfinished jobs.... > > And, I had ./src/test/isolation/output_iso/regression.diffs and regression.out, > which told me below. > > test detach-partition-concurrently-1 ... ok 705 ms > test detach-partition-concurrently-2 ... ok 260 ms > test detach-partition-concurrently-3 ... FAILED 618 ms > test detach-partition-concurrently-4 ... ok 1384 ms > > The diffs file showed me below. > > diff -U3 /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out /home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out > --- /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out 2021-05-24 01:22:22.381488295+0000 > +++ /home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out 2021-05-24 02:47:08.292488295 +0000 > @@ -190,7 +190,7 @@ > > t > step s2detach: <... completed> > -error in steps s1cancel s2detach: ERROR: canceling statement due to user request > +ERROR: canceling statement due to user request > step s2detach2: ALTER TABLE d3_listp DETACH PARTITION d3_listp2 CONCURRENTLY; > ERROR: partition "d3_listp1" already pending detach in partitioned table "public.d3_listp" > step s1c: COMMIT; > > I'm not sure if this is related to the patch or we already have this from OSS HEAD yet. Hmm, I doubt it would be this patch's fault. Maybe we still have some unresolved issues with DETACH PARTITION CONCURRENTLY. I suggest you report this in a new thread preferably after you figure that it's reproducible. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Monday, May 24, 2021 12:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, May 24, 2021 at 12:16 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > When I execute make check-world with v6 additionally, I've gotten > > another failure. I get this about once in > > 20 times of make check-world with v6. > Hmm, I doubt it would be this patch's fault. Maybe we still have some > unresolved issues with DETACH PARTITION CONCURRENTLY. I suggest > you report this in a new thread preferably after you figure that it's > reproducible. OK, I'll do so when I get this with OSS HEAD. Best Regards, Takamichi Osumi
RE: Forget close an open relation in ReorderBufferProcessTXN()
От
"osumi.takamichi@fujitsu.com"
Дата:
On Monday, May 24, 2021 12:57 PM I wrote: > On Monday, May 24, 2021 12:23 PM Amit Langote <amitlangote09@gmail.com> > wrote: > > On Mon, May 24, 2021 at 12:16 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > When I execute make check-world with v6 additionally, I've gotten > > > another failure. I get this about once in > > > 20 times of make check-world with v6. > > Hmm, I doubt it would be this patch's fault. Maybe we still have some > > unresolved issues with DETACH PARTITION CONCURRENTLY. I suggest > you > > report this in a new thread preferably after you figure that it's > > reproducible. > OK, I'll do so when I get this with OSS HEAD. Just now, I've reported this on hackers as a different thread. This was not an issue of the patch. Also, I have no more suggestions to fix the patch set you shared. Best Regards, Takamichi Osumi
On Fri, May 21, 2021 at 1:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when > first initializing an entry. It's possible that without doing so, the > map remains set to a garbage value, which causes the invalidation > callback that runs into such partially initialized entry to segfault > upon trying to deference that garbage pointer. > > I've tried that in the attached v6 patches. Please check. > v6-0001 ========= + send_relation_and_attrs(ancestor, xid, ctx); + /* Map must live as long as the session does. */ oldctx = MemoryContextSwitchTo(CacheMemoryContext); - relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), - CreateTupleDescCopy(outdesc)); + + /* + * Make copies of the TupleDescs that will live as long as the map + * does before putting into the map. + */ + indesc = CreateTupleDescCopy(indesc); + outdesc = CreateTupleDescCopy(outdesc); + relentry->map = convert_tuples_by_name(indesc, outdesc); + if (relentry->map == NULL) + { + /* Map not necessary, so free the TupleDescs too. */ + FreeTupleDesc(indesc); + FreeTupleDesc(outdesc); + } + MemoryContextSwitchTo(oldctx); - send_relation_and_attrs(ancestor, xid, ctx); Why do we need to move send_relation_and_attrs() call? I think it doesn't matter much either way but OTOH, in the existing code, if there is an error (say 'out of memory' or some other) while building the map, we won't send relation attrs whereas with your change we will unnecessarily send those in such a case. I feel there is no need to backpatch v6-0002. We can just make it a HEAD-only change as that doesn't cause any bug even though it is better not to send it. If we consider it as a HEAD-only change then probably we can register it for PG-15. What do you think? -- With Regards, Amit Kapila.
On Thu, May 27, 2021 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, May 21, 2021 at 1:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when > > first initializing an entry. It's possible that without doing so, the > > map remains set to a garbage value, which causes the invalidation > > callback that runs into such partially initialized entry to segfault > > upon trying to deference that garbage pointer. > > > > I've tried that in the attached v6 patches. Please check. > > > > v6-0001 > ========= > + send_relation_and_attrs(ancestor, xid, ctx); > + > /* Map must live as long as the session does. */ > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > - relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), > - CreateTupleDescCopy(outdesc)); > + > + /* > + * Make copies of the TupleDescs that will live as long as the map > + * does before putting into the map. > + */ > + indesc = CreateTupleDescCopy(indesc); > + outdesc = CreateTupleDescCopy(outdesc); > + relentry->map = convert_tuples_by_name(indesc, outdesc); > + if (relentry->map == NULL) > + { > + /* Map not necessary, so free the TupleDescs too. */ > + FreeTupleDesc(indesc); > + FreeTupleDesc(outdesc); > + } > + > MemoryContextSwitchTo(oldctx); > - send_relation_and_attrs(ancestor, xid, ctx); > > Why do we need to move send_relation_and_attrs() call? I think it > doesn't matter much either way but OTOH, in the existing code, if > there is an error (say 'out of memory' or some other) while building > the map, we won't send relation attrs whereas with your change we will > unnecessarily send those in such a case. That's a good point. I've reverted that change in the attached. > I feel there is no need to backpatch v6-0002. We can just make it a > HEAD-only change as that doesn't cause any bug even though it is > better not to send it. If we consider it as a HEAD-only change then > probably we can register it for PG-15. What do you think? Okay, I will see about creating a PG15 CF entry for 0002. Please see attached v7-0001 with the part mentioned above fixed. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Mon, May 31, 2021 at 8:51 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, May 27, 2021 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, May 21, 2021 at 1:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > > Why do we need to move send_relation_and_attrs() call? I think it > > doesn't matter much either way but OTOH, in the existing code, if > > there is an error (say 'out of memory' or some other) while building > > the map, we won't send relation attrs whereas with your change we will > > unnecessarily send those in such a case. > > That's a good point. I've reverted that change in the attached. > Pushed. > > I feel there is no need to backpatch v6-0002. We can just make it a > > HEAD-only change as that doesn't cause any bug even though it is > > better not to send it. If we consider it as a HEAD-only change then > > probably we can register it for PG-15. What do you think? > > Okay, I will see about creating a PG15 CF entry for 0002. > Thanks! -- With Regards, Amit Kapila.
On Tue, Jun 1, 2021 at 6:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, May 31, 2021 at 8:51 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, May 27, 2021 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Why do we need to move send_relation_and_attrs() call? I think it > > > doesn't matter much either way but OTOH, in the existing code, if > > > there is an error (say 'out of memory' or some other) while building > > > the map, we won't send relation attrs whereas with your change we will > > > unnecessarily send those in such a case. > > > > That's a good point. I've reverted that change in the attached. > > Pushed. Thank you. -- Amit Langote EDB: http://www.enterprisedb.com