array_accum aggregate

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема array_accum aggregate
Дата
Msg-id 20061006203453.GJ24675@kenobi.snowman.net
обсуждение исходный текст
Ответы Re: array_accum aggregate  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: array_accum aggregate  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
Greetings,

  The array_accum example aggregate in the user documentation works
  reasonably on small data sets but doesn't work too hot on large ones.
  http://www.postgresql.org/docs/8.1/static/xaggr.html

  Normally I wouldn't care particularly much but it turns out that PL/R
  uses arrays for quite a bit (eg: histograms and other statistics
  functions).  I've also heard other complaints about the performance of
  arrays, though I'm not sure if those were due to array_accum or
  something else.

  Long story short, I set out to build a faster array_accum.  Much to my
  suprise and delight, we already *had* one.  accumArrayResult() and
  makeArrayResult()/construct_md_array() appear to do a fantastic job.
  I've created a couple of 'glue' functions to expose these functions so
  they can be used in an aggregate.  I'm sure they could be improved
  upon and possibly made even smaller than they already are (90 lines
  total for both) but I'd like to throw out the idea of including them
  in core.  The aggregate created with them could also be considered for
  inclusion though I'm less concerned with that.  I don't expect general
  PostgreSQL users would have trouble creating the aggregate- I don't
  know that the average user would be able or willing to write the C
  functions.

  For comparison, the new functions run with:
  time psql -c "select aaccum(generate_series) from generate_series(1,1000000);" > /dev/null
      4.24s real     0.34s user     0.06s system

  Compared to:
  time psql -c "select array_accum(generate_series) from generate_series(1,1000000);" > /dev/null
  ...

  Well, it's still running and it's been over an hour.

  The main differences, as I see it, are: accumArrayResult() works in
  chunks of 64 elements, and uses repalloc().  array_accum uses
  array_set() which works on individual elements and uses
  palloc()/memcpy().  I appriciate that this is done because for most
  cases of array_set() it's not acceptable to modify the input and am
  not suggesting that be changed.  An alternative might be to modify
  array_set() to check if it is in an aggregate and change its behavior
  but adding the seperate functions seemed cleaner and much less
  intrusive to me.

  Please find the functions attached.

          Thanks,

            Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Should there be a charcat?
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: 8.2 translation status?