Re: why doesn't DestroyPartitionDirectory hash_destroy?

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: why doesn't DestroyPartitionDirectory hash_destroy?
Дата
Msg-id f8634ded-43d2-c843-d48b-9720883cfa7b@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: why doesn't DestroyPartitionDirectory hash_destroy?  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2019/03/15 2:56, Robert Haas wrote:
> On Thu, Mar 14, 2019 at 1:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually, now that I've absorbed a bit more about 898e5e329,
>> I don't like very much about it at all.  I think having it
>> try to hang onto pointers into the relcache is a completely
>> wrongheaded design decision, and the right way for it to work
>> is to just copy the PartitionDescs out of the relcache so that
>> they're fully owned by the PartitionDirectory.  I don't see
>> a CopyPartitionDesc function anywhere (maybe it's named something
>> else?) but it doesn't look like it'd be hard to build; most
>> of the work is in partition_bounds_copy() which does exist already.
> 
> Yeah, we could do that.  I have to admit that I don't necessarily
> understand why trying to hang onto pointers into the relcache is a bad
> idea.  It is a bit complicated, but the savings in both memory and CPU
> time seem worth pursuing.  There are a lot of users who wish we scaled
> to a million partitions rather than a hundred, and just copying
> everything all over the place all the time won't get us closer to that
> goal.
> 
> More generally, I think get_relation_info() is becoming an
> increasingly nasty piece of work.  It copies more and more stuff
> instead of just pointing to it, which is necessary mostly because it
> closes the table instead of arranging to do that at the end of query
> planning.  If it did the opposite, the refcount held by the planner
> would make it unnecessary for the PartitionDirectory to hold one, and
> I bet we could also just point to a bunch of the other stuff in this
> function rather than copying that stuff, too.  As time goes by,
> relcache entries are getting more and more complex, and the optimizer
> wants to use more and more data from them for planning purposes, but,
> probably partly because of inertia, we're clinging to an old design
> where everything has to be copied.  Every time someone gets that
> wrong, and it's happened a number of times, we yell at them and tell
> them to copy more stuff instead of thinking up a design where stuff
> doesn't need to be copied.  I think that's a mistake.  You have
> previously disagreed with that position so you probably will now, too,
> but I still think it.

+1.

>> Also, at least so far as the planner's usage is concerned, claiming
>> that we're saving something by not copying is completely bogus,
>> because if we look into set_relation_partition_info, what do we
>> find but a partition_bounds_copy call.  That wouldn't be necessary
>> if we could rely on the PartitionDirectory to own the data structure.
>> (Maybe it's not necessary today.  But given what a house of cards
>> this is, I wouldn't propose ripping it out, just moving it into
>> the PartitionDirectory code.)
> 
> Ugh, I didn't notice the partition_bounds_copy() call in that
> function.  Oops.  Given the foregoing griping, you won't be surprised
> to hear that I'd rather just remove the copying step.  However, it
> sounds like we can do that no matter whether we stick with my design
> or switch to yours, because PartitionDirectory holds a relcache refcnt
> then the pointer will be stable, and if we deep-copy the whole data
> structure then the pointer will also be stable.  Prior to the commit
> at issue, we weren't doing either of those things, so that copy was
> needed until very recently.

One of the patches I've proposed in the "speed up planning with
partitions" thread [1] gets rid of the partition_bounds_copy() call,
because: 1) I too think it's unnecessary as the PartitionBoundInfo won't
change (logically or physically) as long as we've got some lock on the
table, which we do, 2) I've seen it become a significant bottleneck as the
number of partitions crosses thousands.

Thanks,
Amit

[1] https://commitfest.postgresql.org/22/1778/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: why doesn't DestroyPartitionDirectory hash_destroy?
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: WIP: Avoid creation of the free space map for small tables