Re: Eager aggregation, take 3
От | Richard Guo |
---|---|
Тема | Re: Eager aggregation, take 3 |
Дата | |
Msg-id | CAMbWs4_VtGu18P-jWMXAp3Q+mzGDCaT8AhxQDyWv2_rUxsjv8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Eager aggregation, take 3 ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
Ответы |
Re: Eager aggregation, take 3
|
Список | pgsql-hackers |
On Wed, Aug 6, 2025 at 10:44 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote: > > * The has_internal_aggtranstype() check. > > > > To avoid potential memory blowout risks from large partial aggregation > > values, v18 avoids applying eager aggregation if any aggregate uses an > > INTERNAL transition type, as this typically indicates a large internal > > data structure (as in string_agg or array_agg). However, this also > > excludes aggregates like avg(numeric) and sum(numeric), which are > > actually safe to use with eager aggregation. > > > > What we really want to exclude are aggregate functions that can > > produce large transition values by accumulating or concatenating input > > rows. So I'm wondering if we could instead check the transfn_oid > > directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and > > F_STRING_AGG_TRANSFN. We don't need to worry about json_agg, > > jsonb_agg, or xmlagg, since they don't support partial aggregation > > anyway. > I think it makes sense to me. I just wondering if we should follow an > "allow" or "don't-allow" strategy. I mean, instead of a list aggregate > functions that are not allowed we could list functions that are actually > allowed to use eager aggregation, so in this case we ensure that for the > functions that are enabled the eager aggregation can work properly. I ended up still checking for INTERNAL transition types, but explicitly excluded aggregates that use F_NUMERIC_AVG_ACCUM transition function, assuming that avg(numeric) and sum(numeric) are safe in this context. This might still be overly strict, but I prefer to be on the safe side for now. > > * The EAGER_AGG_MIN_GROUP_SIZE threshold > > > > This threshold defines the minimum average group size required to > > consider applying eager aggregation. It was previously set to 2, but > > in v18 it was increased to 20 to be cautious about planning overhead. > > This change was a snap decision though, without any profiling or data > > to back it. > > > > Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum > > needed to consider eager aggregation for them. The resulting plans > > show nice performance improvements without any measurable increase in > > planning time. So, I'm inclined to lower the threshold to 10 for now. > > (Wondering whether we should make this threshold a GUC, so users can > > adjust it based on their needs.) > Having a GUC may sound like a good idea to me TBH. This threshold may > vary from workload to workload (?). I've made this threshold a GUC, with a default value of 8 (further benchmark testing showed that a value of 10 is still too strict for TPC-DS query 4). > > Any comments on these two changes? > It sounds like a good way to go for me, looking forward to the next > patch version to perform some other tests. OK. Here it is. Thanks Richard
Вложения
В списке pgsql-hackers по дате отправления: