Re: why doesn't DestroyPartitionDirectory hash_destroy?
От | Amit Langote |
---|---|
Тема | Re: why doesn't DestroyPartitionDirectory hash_destroy? |
Дата | |
Msg-id | f8634ded-43d2-c843-d48b-9720883cfa7b@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: why doesn't DestroyPartitionDirectory hash_destroy? (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 2019/03/15 2:56, Robert Haas wrote: > On Thu, Mar 14, 2019 at 1:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Actually, now that I've absorbed a bit more about 898e5e329, >> I don't like very much about it at all. I think having it >> try to hang onto pointers into the relcache is a completely >> wrongheaded design decision, and the right way for it to work >> is to just copy the PartitionDescs out of the relcache so that >> they're fully owned by the PartitionDirectory. I don't see >> a CopyPartitionDesc function anywhere (maybe it's named something >> else?) but it doesn't look like it'd be hard to build; most >> of the work is in partition_bounds_copy() which does exist already. > > Yeah, we could do that. I have to admit that I don't necessarily > understand why trying to hang onto pointers into the relcache is a bad > idea. It is a bit complicated, but the savings in both memory and CPU > time seem worth pursuing. There are a lot of users who wish we scaled > to a million partitions rather than a hundred, and just copying > everything all over the place all the time won't get us closer to that > goal. > > More generally, I think get_relation_info() is becoming an > increasingly nasty piece of work. It copies more and more stuff > instead of just pointing to it, which is necessary mostly because it > closes the table instead of arranging to do that at the end of query > planning. If it did the opposite, the refcount held by the planner > would make it unnecessary for the PartitionDirectory to hold one, and > I bet we could also just point to a bunch of the other stuff in this > function rather than copying that stuff, too. As time goes by, > relcache entries are getting more and more complex, and the optimizer > wants to use more and more data from them for planning purposes, but, > probably partly because of inertia, we're clinging to an old design > where everything has to be copied. Every time someone gets that > wrong, and it's happened a number of times, we yell at them and tell > them to copy more stuff instead of thinking up a design where stuff > doesn't need to be copied. I think that's a mistake. You have > previously disagreed with that position so you probably will now, too, > but I still think it. +1. >> Also, at least so far as the planner's usage is concerned, claiming >> that we're saving something by not copying is completely bogus, >> because if we look into set_relation_partition_info, what do we >> find but a partition_bounds_copy call. That wouldn't be necessary >> if we could rely on the PartitionDirectory to own the data structure. >> (Maybe it's not necessary today. But given what a house of cards >> this is, I wouldn't propose ripping it out, just moving it into >> the PartitionDirectory code.) > > Ugh, I didn't notice the partition_bounds_copy() call in that > function. Oops. Given the foregoing griping, you won't be surprised > to hear that I'd rather just remove the copying step. However, it > sounds like we can do that no matter whether we stick with my design > or switch to yours, because PartitionDirectory holds a relcache refcnt > then the pointer will be stable, and if we deep-copy the whole data > structure then the pointer will also be stable. Prior to the commit > at issue, we weren't doing either of those things, so that copy was > needed until very recently. One of the patches I've proposed in the "speed up planning with partitions" thread [1] gets rid of the partition_bounds_copy() call, because: 1) I too think it's unnecessary as the PartitionBoundInfo won't change (logically or physically) as long as we've got some lock on the table, which we do, 2) I've seen it become a significant bottleneck as the number of partitions crosses thousands. Thanks, Amit [1] https://commitfest.postgresql.org/22/1778/
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Amit LangoteДата:
Сообщение: Re: why doesn't DestroyPartitionDirectory hash_destroy?
Следующее
От: Amit KapilaДата:
Сообщение: Re: WIP: Avoid creation of the free space map for small tables