array_accum aggregate

Поиск

Список

Период

Сортировка

От	Stephen Frost
Тема	array_accum aggregate
Дата	6 октября 2006 г. 20:34:59
Msg-id	20061006203453.GJ24675@kenobi.snowman.net обсуждение исходный текст
Ответы	Re: array_accum aggregate (Tom Lane <tgl@sss.pgh.pa.us>) Re: array_accum aggregate (Stephen Frost <sfrost@snowman.net>)
Список	pgsql-hackers

Дерево обсуждения

Greetings,

  The array_accum example aggregate in the user documentation works
  reasonably on small data sets but doesn't work too hot on large ones.
  http://www.postgresql.org/docs/8.1/static/xaggr.html

  Normally I wouldn't care particularly much but it turns out that PL/R
  uses arrays for quite a bit (eg: histograms and other statistics
  functions).  I've also heard other complaints about the performance of
  arrays, though I'm not sure if those were due to array_accum or
  something else.

  Long story short, I set out to build a faster array_accum.  Much to my
  suprise and delight, we already *had* one.  accumArrayResult() and
  makeArrayResult()/construct_md_array() appear to do a fantastic job.
  I've created a couple of 'glue' functions to expose these functions so
  they can be used in an aggregate.  I'm sure they could be improved
  upon and possibly made even smaller than they already are (90 lines
  total for both) but I'd like to throw out the idea of including them
  in core.  The aggregate created with them could also be considered for
  inclusion though I'm less concerned with that.  I don't expect general
  PostgreSQL users would have trouble creating the aggregate- I don't
  know that the average user would be able or willing to write the C
  functions.

  For comparison, the new functions run with:
  time psql -c "select aaccum(generate_series) from generate_series(1,1000000);" > /dev/null
      4.24s real     0.34s user     0.06s system

  Compared to:
  time psql -c "select array_accum(generate_series) from generate_series(1,1000000);" > /dev/null
  ...

  Well, it's still running and it's been over an hour.

  The main differences, as I see it, are: accumArrayResult() works in
  chunks of 64 elements, and uses repalloc().  array_accum uses
  array_set() which works on individual elements and uses
  palloc()/memcpy().  I appriciate that this is done because for most
  cases of array_set() it's not acceptable to modify the input and am
  not suggesting that be changed.  An alternative might be to modify
  array_set() to check if it is in an aggregate and change its behavior
  but adding the seperate functions seemed cleaner and much less
  intrusive to me.

  Please find the functions attached.

          Thanks,

            Stephen

Вложения

aaccum.c

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 06 октября 2006 г., 20:10:58
Сообщение: Re: Should there be a charcat?

Следующее

От: Peter Eisentraut
Дата: 06 октября 2006 г., 20:47:55
Сообщение: Re: 8.2 translation status?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

array_accum aggregate

Вложения

Предыдущее

Следующее