Re: Function array_agg(array)

Поиск
Список
Период
Сортировка
От Ali Akbar
Тема Re: Function array_agg(array)
Дата
Msg-id CACQjQLoGAS6uz6RZxQ=4vNYojZenkRpn28aBM=tcAL-gpHwwpQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Function array_agg(array)  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Function array_agg(array)
Список pgsql-hackers
Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
 
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for array(subselect) defined?
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?)
 
In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

attention: probably we don't would to allow arrays everywhere. 

I've changed the places where i think it's appropriate.
 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. 

this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array.

Regards,
--
Ali Akbar
 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: edward745
Дата:
Сообщение: Re: Getting rid of "accept incoming network connections" prompts on OS X
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: alter user/role CURRENT_USER