2D array aggregation performance (array_agg for arrays)

Поиск
Список
Период
Сортировка
От Dennis Runz
Тема 2D array aggregation performance (array_agg for arrays)
Дата
Msg-id CALB1XpLXq9tKvGPS158KQ7c6rXwjgr8_56G8hCCbN3kU8h=xXA@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
Hello Community,

I am working on a database extension for PostgreSQL (8.4+) to support functions for spectral graph theory of spatial/geometric graphs like proteins. For this purpose we need to store and use huge multidimensional arrays in the database (adjacency matrix for graph).

The performance critical function here is the aggregation of one-dimensional arrays into two-dimensional arrays,
e.g. {1,2} and {3,4} => {{1,2},{3,4}}, respectively a set of arrays into an array of arrays.

The array_agg function performs well, but only supports aggregation of element types into arrays. For performance reasons, we need a similar function that is able to aggregate arrays as shown above. Other functions like array_cat reallocate the arrays after each aggregation step which doesn't scale.

Now I am trying to implement array_agg for array of array aggregation using array_agg_transfn (-> hd_array_transfn) and array_agg_finalfn (-> hd_array_finalfn) from Postgres 9.1 sources as a starting point.

This is what the current code looks like: https://gist.github.com/5b2b60a939bec8410382
I assume it is not sufficient to simply adapt the finalfunction to create a 2D array? I tried this but Postgres crashes in:

(gdb) bt
#0  pg_detoast_datum (datum=0x0) at fmgr.c:2233
#1  0x00ab9303 in construct_md_array (elems=0x220ffbb0, nulls=0x220ffcb8 "", ndims=2, dims=0xbf84c694, lbs=0xbf84c69c, elmtype=1007, elmlen=-1, elmbyval=0 '\000', elmalign=105 'i') at arrayfuncs.c:2936
#2  0x00ac0052 in makeMdArrayResult (astate=0x220ffb88, ndims=2, dims=0xbf84c694, lbs=0xbf84c69c, rcontext=0x220d8aa8, release=0 '\000') at arrayfuncs.c:4665
#3  0x0056c9d1 in hd_array_finalfn () from /usr/lib/postgresql/9.1/lib/hd_array.so
#4  0x009c4ffa in finalize_aggregate (aggstate=<optimized out>, peraggstate=0x220f9d58, pergroupstate=0x220f9e60, resultVal=0x220f9d38, resultIsNull=0x220f9d48 "") at nodeAgg.c:758
# ...

I am a novice to Postgres internals and Postgres programming and would greatly appreciate if anyone could help me with this implementation problem.

We are using PostgreSQL 9.1, but the aggregate should also run on 8.4 at the end.

Best Regards,
Dennis

В списке pgsql-general по дате отправления:

Предыдущее
От: "Tomas Vondra"
Дата:
Сообщение: Re: Getting all entries in a single block with ctid
Следующее
От: salah jubeh
Дата:
Сообщение: Re: psql - TYPE DEFINITION