Re: hyrax vs. RelationBuildPartitionDesc

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: hyrax vs. RelationBuildPartitionDesc
Дата
Msg-id CA+HiwqGZSJS+gB_6EdffMUxPuyJ-H8j9-HFiCWNR9+=Khi0qvw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hyrax vs. RelationBuildPartitionDesc  (Amit Langote <amitlangote09@gmail.com>)
Ответы Re: hyrax vs. RelationBuildPartitionDesc  (Amit Langote <amitlangote09@gmail.com>)
Re: hyrax vs. RelationBuildPartitionDesc  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Jun 5, 2019 at 8:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I noticed a crash with one of the scenarios that I think the patch is
> meant to address.  Let me describe the steps:
>
> Attach gdb (or whatever) to session 1 in which we'll run a query that
> will scan at least two partitions and set a breakpoint in
> expand_partitioned_rtentry().  Run the query, so the breakpoint will
> be hit.  Step through up to the start of the following loop in this
> function:
>
>     i = -1;
>     while ((i = bms_next_member(live_parts, i)) >= 0)
>     {
>         Oid         childOID = partdesc->oids[i];
>         Relation    childrel;
>         RangeTblEntry *childrte;
>         Index       childRTindex;
>         RelOptInfo *childrelinfo;
>
>         /* Open rel, acquiring required locks */
>         childrel = table_open(childOID, lockmode);
>
> Note that 'partdesc' in the above code is from the partition
> directory.  Before stepping through into the loop, start another
> session and attach a new partition.  On into the loop.  When the 1st
> partition is opened, its locking will result in
> RelationClearRelation() being called on the parent relation due to
> partition being attached concurrently, which I observed, is actually
> invoked a couple of times due to recursion.  Parent relation's
> rd_oldpdcxt will be set in this process, which contains the
> aforementioned partition descriptor.
>
> Before moving the loop to process the 2nd partition, attach another
> partition in session 2.  When the 2nd partition is opened,
> RelationClearRelation() will run again and in one of its recursive
> invocations, it destroys the rd_oldpdcxt that was set previously,
> taking the partition directory's partition descriptor with it.
> Anything that tries to access it later crashes.
>
> I couldn't figure out what to blame here though -- the design of
> rd_pdoldcxt or the fact that RelationClearRelation() is invoked
> multiple times.  I will try to study it more closely tomorrow.

On further study, I concluded that it's indeed the multiple
invocations of RelationClearRelation() on the parent relation that
causes rd_pdoldcxt to be destroyed prematurely.  While it's
problematic that the session in which a new partition is attached to
the parent relation broadcasts 2 SI messages to invalidate the parent
relation [1], it's obviously better to fix how RelationClearRelation
manipulates rd_pdoldcxt so that duplicate SI messages are not harmful,
fixing the latter is hardly an option.

Attached is a patch that applies on top of Robert's pdoldcxt-v1.patch,
which seems to fix this issue for me.  In the rare case where inval
messages due to multiple concurrent attachments get processed in a
session holding a reference on a partitioned table, there will be
multiple "old" partition descriptors and corresponding "old" contexts.
All of the old contexts are chained together via context-re-parenting,
with the newest "old" context being accessible from the table's
rd_pdoldcxt.

Thanks,
Amit

[1]: inval.c: AddRelcacheInvalidationMessage() does try to prevent
duplicate messages from being emitted, but the logic to detect
duplicates doesn't work across CommandCounterIncrement().  There are
at two relcache inval requests in ATTACH PARTITION code path, emitted
by SetRelationHasSubclass and StorePartitionBound, resp., which are
separated by at least one CCI, so the 2nd request isn't detected as
the duplicate of the 1st.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Should we warn against using too many partitions?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_basebackup failure after setting default_table_access_methodoption