Обсуждение: Handling memory contexts in aggregate function invoking other built-in aggregate functions

Поиск
Список
Период
Сортировка

Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Matt Magoffin
Дата:
I am trying to see if I can write a C aggregate function that operates on numeric[] array columns, and essentially
performsaggregates on the array elements as if they were individual columns, resulting in an output numeric[] array
withthe aggregate element values. I do realise I can use unnest() and then the built-in aggregates like this: 

SELECT array_agg(v ORDER BY i) FROM (
  SELECT i, sum(v) AS v
  FROM (VALUES
      (ARRAY[1,2,3]::numeric[]),
      (ARRAY[2,3,4]::numeric[])
    ) n(nums), unnest(n.nums) WITH ORDINALITY AS p(v,i)
  GROUP BY i
) g;

 array_agg
-----------
 {3,5,7}

What I am doing is based on the aggs_for_vecs [1] extension so my goal is to have SQL like this:

SELECT vec_to_sum(nums) FROM (VALUES
    (ARRAY[1,2,3]::numeric[]),
    (ARRAY[2,3,4]::numeric[])
  ) n(nums);

I have the extension working with numerics by doing numeric calculations with the help of the functions defined in
numeric.h(numeric_add_opt_error() in this case) but for other aggregates like average or var_samp I was hoping to piggy
backoff all the support in numeric.c. The primary motivation for doing this as a C extension is because it has proved
toexecute much faster than the equivalent unnest() SQL. 

So far, I have been working on average support via the vec_to_mean() aggregate, and my aggregate's [2] transition
functionsets up a FunctionCallInfo for the numeric_avg_accum() [3] function and then loops over the input array
elements,calling numeric_avg_accum() and saving its result state object in my aggregate’s state. Before looping, I
switchthe memory context to the aggregate’s context, i.e. there is stuff like 

MemoryContext aggContext;
AggCheckCallContext(fcinfo, &aggContext);
old = MemoryContextSwitchTo(aggContext);
for (i = 0; i < arrayLength; i++) {
  // invoke numeric_avg_accum() for each array element, store result in my state
}
MemoryContextSwitchTo(old);

In my aggregate function's final function I set up a FunctionCallInfo for the numeric_avg() function and loop over all
thenumeric_avg_accum() state objects [4], saving their results as the final array returned from the vec_to_mean()
aggregate.

Overall, the approach seems like it should work, however I don’t believe I’m handling the memory contexts correctly.
Thefunction works sometimes, but crashes or returns incorrect results sometimes. I tried to debug what’s going on, but
Iam very new to working on a C extension for Postgres so all I surmise so far is that some of my saved state objects
appearto be released before they should be. That finally brings me to my questions: 

1. Is it even reasonable for me to try to do this approach?
2. Is there anything I should be doing differently with memory contexts, like creating a new one(s) for the calls to
numeric_avg_accum()?
3. Is there something else I’m doing wrong?

Thank you very much for your time. I hope I was clear, as I mentioned this is all quite new to me so my
assumptions/approach/terminologymight be off. 

Cheers,
Matt Magoffin


[1] https://github.com/pjungwir/aggs_for_vecs

[2] https://github.com/SolarNetwork/aggs_for_vecs/blob/feature/numeric-stats-agg/vec_to_mean_numeric.c

[3]
https://github.com/SolarNetwork/aggs_for_vecs/blob/7c2a5aad35a814dca6d9f5a35df1de6e4b98d149/vec_to_mean_numeric.c#L57-L58

[4]
https://github.com/SolarNetwork/aggs_for_vecs/blob/7c2a5aad35a814dca6d9f5a35df1de6e4b98d149/vec_to_mean_numeric.c#L117-L126




Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Tom Lane
Дата:
Matt Magoffin <postgresql.org@msqr.us> writes:
> So far, I have been working on average support via the vec_to_mean() aggregate, and my aggregate's [2] transition
functionsets up a FunctionCallInfo for the numeric_avg_accum() [3] function and then loops over the input array
elements,calling numeric_avg_accum() and saving its result state object in my aggregate’s state. Before looping, I
switchthe memory context to the aggregate’s context, i.e. there is stuff like 

> MemoryContext aggContext;
> AggCheckCallContext(fcinfo, &aggContext);
> old = MemoryContextSwitchTo(aggContext);
> for (i = 0; i < arrayLength; i++) {
>   // invoke numeric_avg_accum() for each array element, store result in my state
> }
> MemoryContextSwitchTo(old);

Calling numeric_avg_accum in the agg_context is unnecessary, and possibly
counterproductive (it might leak memory in that context, since like all
other aggregates it assumes it's called in a short-lived context).
That doesn't seem to explain your crash though.

Are you testing in an --enable-cassert build?  If not, do that;
it might make the cause of the crashes more apparent, thanks to
CLOBBER_FREED_MEMORY and other debug support.

            regards, tom lane



Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Tom Lane
Дата:
Matt Magoffin <postgresql.org@msqr.us> writes:
> So far, I have been working on average support via the vec_to_mean() aggregate, and my aggregate's [2] transition
functionsets up a FunctionCallInfo for the numeric_avg_accum() [3] function and then loops over the input array
elements,calling numeric_avg_accum() and saving its result state object in my aggregate’s state. Before looping, I
switchthe memory context to the aggregate’s context, i.e. there is stuff like 

> MemoryContext aggContext;
> AggCheckCallContext(fcinfo, &aggContext);
> old = MemoryContextSwitchTo(aggContext);
> for (i = 0; i < arrayLength; i++) {
>   // invoke numeric_avg_accum() for each array element, store result in my state
> }
> MemoryContextSwitchTo(old);

Calling numeric_avg_accum in the agg_context is unnecessary, and possibly
counterproductive (it might leak memory in that context, since like all
other aggregates it assumes it's called in a short-lived context).
That doesn't seem to explain your crash though.

Are you testing in an --enable-cassert build?  If not, do that;
it might make the cause of the crashes more apparent, thanks to
CLOBBER_FREED_MEMORY and other debug support.

            regards, tom lane



Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Matt Magoffin
Дата:
On 5/12/2021, at 5:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Calling numeric_avg_accum in the agg_context is unnecessary, and possibly
> counterproductive (it might leak memory in that context, since like all
> other aggregates it assumes it's called in a short-lived context).


OK, thanks for that, I’ll remove the context switch before calling numeric_avg_accum and test more.

> Are you testing in an --enable-cassert build?  If not, do that;
> it might make the cause of the crashes more apparent, thanks to
> CLOBBER_FREED_MEMORY and other debug support.

I did build with --enable-cassert, and I did see the state argument pointer passed to numeric_avg_accum
 as 0x7f7f7f7f7f, so now I understand why that was thanks to seeing the information about what that means on the Dev
FAQ,thanks for that. 

So given you didn’t say I shouldn’t be trying to invoke these aggregate functions as I’m trying to, does that mean in
theorythere isn’t anything inappropriate about doing this as far as you know? 

Cheers,
Matt


Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Matt Magoffin
Дата:
On 5/12/2021, at 5:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Calling numeric_avg_accum in the agg_context is unnecessary, and possibly
> counterproductive (it might leak memory in that context, since like all
> other aggregates it assumes it's called in a short-lived context).


OK, thanks for that, I’ll remove the context switch before calling numeric_avg_accum and test more.

> Are you testing in an --enable-cassert build?  If not, do that;
> it might make the cause of the crashes more apparent, thanks to
> CLOBBER_FREED_MEMORY and other debug support.

I did build with --enable-cassert, and I did see the state argument pointer passed to numeric_avg_accum
 as 0x7f7f7f7f7f, so now I understand why that was thanks to seeing the information about what that means on the Dev
FAQ,thanks for that. 

So given you didn’t say I shouldn’t be trying to invoke these aggregate functions as I’m trying to, does that mean in
theorythere isn’t anything inappropriate about doing this as far as you know? 

Cheers,
Matt


Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Tom Lane
Дата:
Matt Magoffin <postgresql.org@msqr.us> writes:
> On 5/12/2021, at 5:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Are you testing in an --enable-cassert build?  If not, do that;
>> it might make the cause of the crashes more apparent, thanks to
>> CLOBBER_FREED_MEMORY and other debug support.

> I did build with --enable-cassert, and I did see the state argument pointer passed to numeric_avg_accum
>  as 0x7f7f7f7f7f, so now I understand why that was thanks to seeing the information about what that means on the Dev
FAQ,thanks for that. 

So that probably means that you weren't careful about allocating your
own state data in the long-lived context (agg_context), and so it
got freed between calls.

            regards, tom lane



Re: Handling memory contexts in aggregate function invoking other built-in aggregate functions

От
Matt Magoffin
Дата:
On 5/12/2021, at 9:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So that probably means that you weren't careful about allocating your
own state data in the long-lived context (agg_context), and so it
got freed between calls.

It turns out I wasn’t careful about setting isnull on the passed in state argument. After I fixed that [1] everything appears to be running smoothly!

— m@