RE: Fix for segfault in logical replication on master

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: Fix for segfault in logical replication on master
Дата
Msg-id OSBPR01MB4888BA75D9B4CCCA0D78D55FED0E9@OSBPR01MB4888.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Fix for segfault in logical replication on master  (Mark Dilger <mark.dilger@enterprisedb.com>)
Ответы Re: Fix for segfault in logical replication on master  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Fix for segfault in logical replication on master  (Mark Dilger <mark.dilger@enterprisedb.com>)
Список pgsql-hackers
On Thursday, June 17, 2021 2:43 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> > On Jun 16, 2021, at 10:19 PM, osumi.takamichi@fujitsu.com wrote:
> >
> > I started to analyze your report and
> > will reply after my idea to your modification is settled.
> 
> Thank you.
I'll share my first analysis.

> In commit e7eea52b2d, you introduced a new function,
> RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> if a relation has a replica identity index.  That code segfaults under certain
> conditions.  A test case to demonstrate that is attached.  Prior to patching
> the code, this new test gets stuck waiting for replication to finish, which never
> happens.  You have to break out of the test and check
> tmp_check/log/021_no_replica_identity_publisher.log.
> 
> I believe this bit of logic in src/backend/utils/cache/relcache.c:
> 
>       indexDesc = RelationIdGetRelation(relation->rd_replidindex);
>       for (i = 0; i < indexDesc->rd_index->indnatts; i++)
> 
> is unsafe without further checks, also attached.
You are absolutely right.
I checked the crash scenario and reproduced the core,
which has a null indexDesc. Also, rd_replidindex must be checked beforehand
as you included in your patch, because having an index does not necessarily
mean to have a replica identity index. As the proof of this, the oid of
rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
has passed with your fix.

Also, your test looks essentially minimum(suitable for the problem) to me.

* RelationGetIdentityKeyBitmap
+       /*
+        * Fall out if the description is not for an index, suggesting
+        * affairs have changed since we looked.  XXX Should we log a
+        * complaint here?
+        */
+       if (!indexDesc)
+               return NULL;
+       if (!indexDesc->rd_index)
+       {
+               RelationClose(indexDesc);
+               return NULL;
+       }
For the 1st check, isn't it better to use RelationIsValid() ?
I agree with having the check itself of course, though.

Additionally, In what kind of actual scenario, did you think that
we come to the part to "log a complaint" ?

I'm going to spend some time to analyze RelationGetIndexAttrBitmap next
to know if similar hazards can happen, because RelationGetIdentityKeyBitmap's logic
comes from the function mainly.


Best Regards,
    Takamichi Osumi


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench bug candidate: negative "initial connection time"
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: locking [user] catalog tables vs 2pc vs logical rep