Re: hyrax vs. RelationBuildPartitionDesc

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: hyrax vs. RelationBuildPartitionDesc
Дата
Msg-id CA+HiwqEwrP0TettuL5bbwGH_hu3L=5am9pajy7FmASQYDBq4aw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hyrax vs. RelationBuildPartitionDesc  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: hyrax vs. RelationBuildPartitionDesc  (Amit Langote <amitlangote09@gmail.com>)
Список pgsql-hackers
On Tue, Jun 4, 2019 at 9:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 17, 2019 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah, I did some additional testing that showed that it's pretty darn
> > hard to get the leak to amount to anything.  The test case that I
> > originally posted did many DDLs in a single transaction, and it
> > seems that that's actually essential to get a meaningful leak; as
> > soon as you exit the transaction the leaked contexts will be recovered
> > during sinval cleanup.
>
> My colleague Amul Sul rediscovered this same leak when he tried to
> attach lots of partitions to an existing partitioned table, all in the
> course of a single transaction.  This seems a little less artificial
> than Tom's original reproducer, which involved attaching and detaching
> the same partition repeatedly.
>
> Here is a patch that tries to fix this, along the lines I previously
> suggested; Amul reports that it does work for him.

Thanks for the patch.

I noticed a crash with one of the scenarios that I think the patch is
meant to address.  Let me describe the steps:

Attach gdb (or whatever) to session 1 in which we'll run a query that
will scan at least two partitions and set a breakpoint in
expand_partitioned_rtentry().  Run the query, so the breakpoint will
be hit.  Step through up to the start of the following loop in this
function:

    i = -1;
    while ((i = bms_next_member(live_parts, i)) >= 0)
    {
        Oid         childOID = partdesc->oids[i];
        Relation    childrel;
        RangeTblEntry *childrte;
        Index       childRTindex;
        RelOptInfo *childrelinfo;

        /* Open rel, acquiring required locks */
        childrel = table_open(childOID, lockmode);

Note that 'partdesc' in the above code is from the partition
directory.  Before stepping through into the loop, start another
session and attach a new partition.  On into the loop.  When the 1st
partition is opened, its locking will result in
RelationClearRelation() being called on the parent relation due to
partition being attached concurrently, which I observed, is actually
invoked a couple of times due to recursion.  Parent relation's
rd_oldpdcxt will be set in this process, which contains the
aforementioned partition descriptor.

Before moving the loop to process the 2nd partition, attach another
partition in session 2.  When the 2nd partition is opened,
RelationClearRelation() will run again and in one of its recursive
invocations, it destroys the rd_oldpdcxt that was set previously,
taking the partition directory's partition descriptor with it.
Anything that tries to access it later crashes.

I couldn't figure out what to blame here though -- the design of
rd_pdoldcxt or the fact that RelationClearRelation() is invoked
multiple times.  I will try to study it more closely tomorrow.

Thanks,
Amit



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Confusing error message for REINDEX TABLE CONCURRENTLY
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Confusing comment for function ExecParallelEstimate