Re: Eager aggregation, take 3
От | Robert Haas |
---|---|
Тема | Re: Eager aggregation, take 3 |
Дата | |
Msg-id | CA+TgmoaycfwRGtVpzmgUbEibP5zyuNRZoJ4Sc0475tRW4rofoQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: Eager aggregation, take 3
|
Список | pgsql-hackers |
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <guofenglinux@gmail.com> wrote: > Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum > needed to consider eager aggregation for them. The resulting plans > show nice performance improvements without any measurable increase in > planning time. So, I'm inclined to lower the threshold to 10 for now. > (Wondering whether we should make this threshold a GUC, so users can > adjust it based on their needs.) Like Matheus, I think a GUC is reasonable. A significant danger here appears to be the possibility of a performance cliff, where queries are optimized very different when the ratio is 9.99 vs. 10.01, say. It would be nice if there were some way to mitigate that danger, but at least a GUC avoids chaining the performance of the whole system to a hard-coded value. It might be worth considering whether there are heuristics other than the group size that could help here. Possibly that's just making things more complicated to no benefit. It seems to me, for example, that reducing 100 rows to 10 is quite different from reducing a million rows to 100,000. On the whole, the latter seems more likely to work out well, but it's tricky, because the effort expended per group can be arbitrarily high. I think we do want to let the cost model make most of the decisions, and just use this threshold to prune ideas that are obviously bad at an early stage. That said, it's worth thinking about how this interacts with the just-considered-one-eager-agg strategy. Does this threshold apply before or after that rule? For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a count of orders grouped by customer name. Aggregating on the dimension table (in this case, the list of customers) is probably useless, but aggregating on the join column of the fact table has a good chance of being useful. If we consider only one of those strategies, we want it to be the right one. This threshold could be the thing that helps us to get it right. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: