overhead due to casting extra parameters with aggregates (over andover)
От | Tomas Vondra |
---|---|
Тема | overhead due to casting extra parameters with aggregates (over andover) |
Дата | |
Msg-id | 20190923155621.ijhq7vtpzdbu3hlx@development обсуждение исходный текст |
Ответы |
Re: overhead due to casting extra parameters with aggregates (over and over)
|
Список | pgsql-hackers |
Hi, I've been working on a custom aggregate, and I've ran into some fairly annoying overhead due to casting direct parameters over and over. I'm wondering if there's a way to eliminate this, somehow, without having to do an explicit cast. Imagine you have a simple aggregate: CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[]) ( ... ); with two direct parameters (actually, I'm not sure that's the correct term, becuse this is not an ordered-set aggregate and [1] only talks about direct parameters in that context). Anyway, I'm talking about the extra parameters, after the 'double precision' value to aggregate. The last parameter is an array of values in [0.0,1.0], representing percentiles (similarly to what we do in percentile_cont). It's annoying to write literal values, so let's use CTE to generate the array: WITH perc AS (SELECT array_agg(i/100.0) AS p FROM generate_series(1,99) s(i)) SELECT SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc)) FROM generate_series(1,10000000); which does work, but it's running for ~180 seconds. When used with an explicit array literal, it runs in ~1.6 second. SELECT tdigest_percentile(random(), 100, ARRAY[0.01, ..., 0.99])) FROM generate_series(1,10000000); After a while, I've realized that the issue is casting - the CTE produces numeric[] array, and we do the cast to double precision[] on every call to the state transition function (and we do ~10M of those). The cast is fairly expensive - much more expensive than the aggregate itself. The explicit literal ends up being the right type, so the whole query is much faster. And indeed, adding the explicit cast to the CTE query WITH perc AS (SELECT array_agg((i/100.0)::double precision) AS p FROM generate_series(1,99) s(i)) SELECT SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc)) FROM generate_series(1,10000000); does the trick - the query is ~1.6s again. I wonder if there's a chance to detect and handle this without having to do the cast over and over? I'm thinking that's not quite possible, because the value is not actually guaranteed to be the same for all calls (even though it's the case for the example I've given). But maybe we could flag the parameter somehow, to make it more like the direct parameter (which is only evaluated once). I don't really need those extra parameters in the transition function at all, it's fine to just get it to the final function (and there should be far fewer calls to those). regards [1] https://www.postgresql.org/docs/current/sql-createaggregate.html -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: