Re: A performance regression issue with Memoize
От | Robert Haas |
---|---|
Тема | Re: A performance regression issue with Memoize |
Дата | |
Msg-id | CA+TgmoY6C=PrWRbHsQqCMWoHWPuYoFLKfpnryTpn_1fEDOqJLw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: A performance regression issue with Memoize (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Tue, Jul 29, 2025 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <dgrowleyml@gmail.com> writes: > > For the record, I 100% agree that there will always be cases where > > statistics are just unable to represent what is discovered at > > run-time, so having some sort of ability to adapt at run-time seems > > like a natural progression on the evolutionary chain. I just don't > > know if it's the best or best next step to make. I suspect we might be > > skipping a few steps from what we have now if we went there in the > > near future. We don't yet have extended statistics for joins yet, for > > example. > > Yeah. There is plenty remaining to be done towards collecting and > applying traditional sorts of statistics. I worry about ideas > such as run-time plan changes because we will have exactly zero > ability to predict what'll happen if the executor starts doing > that. Maybe it'll be great, but what do you do if it isn't? Well, you already know that what you're doing isn't great. If the currently-selected alternative is terrible, the other alternative doesn't have to be that great to be a win. I've thought about this mostly in the context of the decision between a Nested Loop and a Hash Join. Subject to some conditions, these are interchangeable: at any point you could decide that on the next iteration you're going to put all the inner rows into a hash table and just probe that. The "only" downside is that it could turn out that, unluckily, the next iteration was also the last one that was ever going to happen, and then the overhead to build the hash table was wasted. If the Nested Loop is parameterized, the Hash Join requires a complete scan of the inner side of the join, which requires a different plan variant, and which is potentially quite expensive. But switching from a plain Nested Loop to Nested Loop + Memoize wouldn't have that problem. You never have to make a complete scan of the inner side. You can just decide to start caching some results for individual parameter values whenever you want, and if it turns out that they're never useful, you haven't lost nearly as much. So a strategy like "start memoizing when we exceed the expected loop count by 20x" might be viable. I'm not really sure, I haven't done the experiments, but it seems to me that the downsides of this kind of strategy switch might be pretty minimal even when things work out anti-optimally. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: