Re: PATCH: decreasing memory needlessly consumed by array_agg

Поиск
Список
Период
Сортировка
От Ali Akbar
Тема Re: PATCH: decreasing memory needlessly consumed by array_agg
Дата
Msg-id CACQjQLoYZeBBH3noRoRxafoMbxXpreBCe6JiyYN6iLTSm3Ep2A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: decreasing memory needlessly consumed by array_agg  (Ali Akbar <the.apaan@gmail.com>)
Ответы Re: PATCH: decreasing memory needlessly consumed by array_agg  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
In the CF, the status becomes "Needs Review". Let's continue our discussion of makeArrayResult* behavior if subcontext=false and release=true (more below):
2014-12-22 8:08 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

With this API, i think we should make it clear if we call initArrayResult with subcontext=false, we can't call makeArrayResult, but we must use makeMdArrayResult directly.

Or better, we can modify makeArrayResult to release according to astate->private_cxt:
@@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
    dims[0] = astate->nelems;
    lbs[0] = 1;
 
-   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
+   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, astate->private_cxt);

Or else we implement what you suggest below (more comments below):

Thinking about the 'release' flag a bit more - maybe we could do this
instead:

    if (release && astate->private_cxt)
        MemoryContextDelete(astate->mcontext);
    else if (release)
    {
        pfree(astate->dvalues);
        pfree(astate->dnulls);
        pfree(astate);
    }

i.e. either destroy the whole context if possible, and just free the
memory when using a shared memory context. But I'm afraid this would
penalize the shared memory context, because that's intended for cases
where all the build states coexist in parallel and then at some point
are all converted into a result and thrown away. Adding pfree() calls is
no improvement here, and just wastes cycles.

As per Tom's comment, i'm using "parent memory context" instead of "shared memory context" below.

In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context.

In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all.

As for the v6 patch:
- the patch applies cleanly to master
- make check is successfull
- memory benefit is still there
- performance benefit i think is negligible

Reviewing the code, found this:
@@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
         elog(ERROR, "array_agg_array_transfn called in non-aggregate context");
     }
 
-    state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
+    if (PG_ARGISNULL(0))
+    {
+        Oid            element_type = get_element_type(arg1_typeid);
+
+        if (!OidIsValid(element_type))
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("data type %s is not an array type",
+                            format_type_be(arg1_typeid))));

digging more, it looks like those code required because accumArrayResultArr checks the element type:
        /* First time through --- initialize */
        Oid            element_type = get_element_type(array_type);

        if (!OidIsValid(element_type))
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("data type %s is not an array type",
                            format_type_be(array_type))));
        astate = initArrayResultArr(array_type, element_type, rcontext, true);

I think it should be in initArrayResultArr, because it is an initialization-only check, shouldn't it?

Regards,
--
Ali Akbar

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Fillfactor for GIN indexes
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Transactions involving multiple postgres foreign servers