Re: hyrax vs. RelationBuildPartitionDesc

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: hyrax vs. RelationBuildPartitionDesc
Дата
Msg-id 5a72697b-0738-0ce5-2531-88febf776bff@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: hyrax vs. RelationBuildPartitionDesc  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: hyrax vs. RelationBuildPartitionDesc  (Amit Langote <amitlangote09@gmail.com>)
Re: hyrax vs. RelationBuildPartitionDesc  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2019/04/15 2:38, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> On 2019/04/10 15:42, Michael Paquier wrote:
>>> It seems to me that Tom's argument to push in the way relcache
>>> information is handled by copying its contents sounds sensible to me.
>>> That's not perfect, but it is consistent with what exists (note
>>> PublicationActions for a rather-still-not-much-complex example of
>>> structure which gets copied from the relcache).
> 
>> I gave my vote for direct access of unchangeable relcache substructures
>> (TupleDesc, PartitionDesc, etc.), because they're accessed during the
>> planning of every user query and fairly expensive to copy compared to list
>> of indexes or PublicationActions that you're citing.  To further my point
>> a bit, I wonder why PublicationActions needs to be copied out of relcache
>> at all?  Looking at its usage in execReplication.c, it seems we can do
>> fine without copying, because we are holding both a lock and a relcache
>> reference on the replication target relation during the entirety of
>> apply_handle_insert(), which means that information can't change under us,
>> neither logically nor physically.
> 
> So the point here is that that reasoning is faulty.  You *cannot* assume,
> no matter how strong a lock or how many pins you hold, that a relcache
> entry will not get rebuilt underneath you.  Cache flushes happen
> regardless.  And unless relcache.c takes special measures to prevent it,
> a rebuild will result in moving subsidiary data structures and thereby
> breaking any pointers you may have pointing into those data structures.
> 
> For certain subsidiary structures such as the relation tupdesc,
> we do take such special measures: that's what the "keep_xxx" dance in
> RelationClearRelation is.  However, that's expensive, both in cycles
> and maintenance effort: it requires having code that can decide equality
> of the subsidiary data structures, which we might well have no other use
> for, and which we certainly don't have strong tests for correctness of.
> It's also very error-prone for callers, because there isn't any good way
> to cross-check that code using a long-lived pointer to a subsidiary
> structure is holding a lock that's strong enough to guarantee non-mutation
> of that structure, or even that relcache.c provides any such guarantee
> at all.  (If our periodic attempts to reduce lock strength for assorted
> DDL operations don't scare the pants off you in this connection, you have
> not thought hard enough about it.)  So I think that even though we've
> largely gotten away with this approach so far, it's also a half-baked
> kluge that we should be looking to get rid of, not extend to yet more
> cases.

Thanks for the explanation.

I understand that simply having a lock and a nonzero refcount on a
relation doesn't prevent someone else from changing it concurrently.

I get that we want to get rid of the keep_* kludge in the long term, but
is it wrong to think, for example, that having keep_partdesc today allows
us today to keep the pointer to rd_partdesc as long as we're holding the
relation open or refcnt on the whole relation such as with
PartitionDirectory mechanism?

> To my mind there are only two trustworthy solutions to the problem of
> wanting time-extended usage of a relcache subsidiary data structure: one
> is to copy it, and the other is to reference-count it.  I think that going
> over to a reference-count-based approach for many of these structures
> might well be something we should do in future, maybe even the very near
> future.  In the mean time, though, I'm not really satisfied with inserting
> half-baked kluges, especially not ones that are different from our other
> half-baked kluges for similar purposes.  I think that's a path to creating
> hard-to-reproduce bugs.

+1 to reference-count-based approach.

I've occasionally wondered why there is both keep_tupdesc kludge and
refcounting for table TupleDescs.  I guess it's because *only* the
TupleTableSlot mechanism in the executor uses TupleDesc pinning (that is,
refcounting) and the rest of the sites depend on the shaky guarantee
provided by keep_tupdesc.

Thanks,
Amit




В списке pgsql-hackers по дате отправления:

Предыдущее
От: zedaardv@gmail.com
Дата:
Сообщение: Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos
Следующее
От: Antonin Houska
Дата:
Сообщение: Re: Attempt to consolidate reading of XLOG page