Re: overhead due to casting extra parameters with aggregates (overand over)
От | Tomas Vondra |
---|---|
Тема | Re: overhead due to casting extra parameters with aggregates (overand over) |
Дата | |
Msg-id | 20190923174447.lncaq7g5l244azia@development обсуждение исходный текст |
Ответ на | Re: overhead due to casting extra parameters with aggregates (over and over) (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I've been working on a custom aggregate, and I've ran into some fairly >> annoying overhead due to casting direct parameters over and over. I'm >> wondering if there's a way to eliminate this, somehow, without having to >> do an explicit cast. > >> Imagine you have a simple aggregate: > >> CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[]) >> ( >> ... >> ); > >> with two direct parameters (actually, I'm not sure that's the correct >> term, becuse this is not an ordered-set aggregate and [1] only talks >> about direct parameters in that context). Anyway, I'm talking about the >> extra parameters, after the 'double precision' value to aggregate. > >But you're not telling the system that those are direct parameters, >at least not if you mean that they can only legitimately have one value >across the whole query. As-is, they're just more aggregated arguments >so we have to evaluate them again at each row. > Understood. >It's fairly messy that the SQL spec ties direct arguments to ordered-set >aggregates; you'd think there'd be some value in treating those features >as orthogonal. I'm not sure how we could wedge them into the syntax >otherwise, though :-(. You could perhaps convert your aggregate to >an ordered-set aggregate, but then you'd be paying for a sort that >you don't need, IIUC. > Yeah, having to do the sort (and keep all the data) is exactly what the tdigest is meant to eliminate, so making it an ordered-set aggregate is exactly the thing I don't want to do. Also, it disables parallel query, which is another reason not to do that. >> After a while, I've realized that the issue is casting - the CTE >> produces numeric[] array, and we do the cast to double precision[] on >> every call to the state transition function (and we do ~10M of those). > >The only reason that the CTE reference is cheap is that we understand >that it's stable so we don't have to recompute it each time; otherwise >you'd be moaning about that more than the cast. As you say, the short >term workaround is to do the casting inside the sub-select. I think the >long term fix is to generically avoid re-computing stable subexpressions. >There was a patch for that a year or so ago but the author never finished >it, AFAIR. > Hmmm, yeah. I'll dig throgh the archives, although it's not a very high priority - it's more a thing that surprised/bugged me while working on the custom aggregate. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: