Re: Optimizing ResouceOwner to speed up COPY
| От | Tomas Vondra |
|---|---|
| Тема | Re: Optimizing ResouceOwner to speed up COPY |
| Дата | |
| Msg-id | c58fc2de-2fdb-4c65-9b81-5f68794859f6@vondra.me обсуждение исходный текст |
| Ответ на | Re: Optimizing ResouceOwner to speed up COPY (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: Optimizing ResouceOwner to speed up COPY
|
| Список | pgsql-hackers |
On 10/16/25 20:12, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >> The reason is pretty simple - ResourceOwner tracks the resources in a >> very simple hash table, with O(n^2) behavior with duplicates. This >> happens with COPY, because COPY creates an array of a 1000 tuple slots, >> and each slot references the same tuple descriptor. And the descriptor >> is added to ResourceOwner for each slot. >> ... >> There's an easy way to improve this by allowing a single hash entry to >> represent multiple references to the same resource. The attached patch >> adds a "count" to the ResourceElem, tracking how many times that >> resource was added. So if you add 1000 tuples slots, the descriptor will >> have just one ResourceElem entry with count=1000. > > Hmm. I don't love the 50% increase in sizeof(ResourceElem) ... maybe > that's negligible, or maybe it isn't. I agree the +50% is not great. My plan was to maybe use a smaller data type, say uint8 or uint16, and reduce the overhead that way (while still massively reducing the number of entries). But I didn't realize there will always be extra padding, making this pointless. Maybe we could make the struct __packed__, and put the counter last (and maybe make it uint16)? But I don't see any other packed structs, so I guess there's a reason not to have them. I'm not sure if this is (or is not) negligible. In cases with duplicates this is actually saves a lot of memory, so it's a win. I'm not sure about cases without duplicates - it'll use more memory, but I'm not sure how many resource owners there usually are. Or how many entries they contain. That'll probably determine if it's negligible. My intuition was there wouldn't be that many, and that the typical number of elements is fairly low (if not, why have the initial capacity set to 64/128)? But maybe my intuition is wrong. > Can you find evidence of this change being helpful for anything > except this specific scenario in COPY? I went through the ResourceOwnerRemember() calls, looking for other cases that might create a lot of duplicates, similar to the tuple descriptors, but I haven't found anything obvious. Other resources seem to be either naturally unique or limited to very few duplicates. > Because we could probably find some way to avoid registering all the > doppelganger slots, if that's the only culprit. I'm not against other approaches. I didn't want to rework the foundation of the resource owner management, and this seemed simple, and with minimal impact on cases without duplicates (except for the struct getting larger). thanks -- Tomas Vondra
В списке pgsql-hackers по дате отправления: