Обсуждение: Postgresql for a CEP app

Поиск
Список
Период
Сортировка

Postgresql for a CEP app

От
Leonardo Francalanci
Дата:
Hi,


I need to generate aggregates of data coming from a stream.

I could easily doing it inserting data coming from the stream into a table,
and then query it using something like:

select <my aggregation function> from atable group by <a column list>

The problem with this approach is that I would have to wait for the whole
stream to be finished before making the above query; since we're talking
about 20M+ rows, it would take some time for the query to finish.

What if I do something like:

select <my aggregation function> from my_fifo_function([...])
group by <a column list>

where my_fifo_function reads data from the stream and returns "rows" as
soon as they are available on the stream? This way I would get the reply
as soon as the stream has finished (assuming postgresql can keep up
with that). In other words, the query would be made even before the
stream has "started", and would last at least as long as the stream.

(of course, I don't need data from the stream to be saved in any way,
that's why I don't need to store it in any table).

Does postgresql read data from a function returning a SETOF row by row?
Or it waits the whole function to be finished (caching the whole resultset)
before starting to use the returned data? If it reads row by row I think it
could work...


Would that make sense?




Re: Postgresql for a CEP app

От
Raymond O'Donnell
Дата:
On 29/09/2010 10:52, Leonardo Francalanci wrote:
> Hi,
>
>
> I need to generate aggregates of data coming from a stream.
>
> I could easily doing it inserting data coming from the stream into a table,
> and then query it using something like:
>
> select<my aggregation function>  from atable group by<a column list>
>
> The problem with this approach is that I would have to wait for the whole
> stream to be finished before making the above query; since we're talking
> about 20M+ rows, it would take some time for the query to finish.
>
> What if I do something like:
>
> select<my aggregation function>  from my_fifo_function([...])
> group by<a column list>
>
> where my_fifo_function reads data from the stream and returns "rows" as
> soon as they are available on the stream? This way I would get the reply
> as soon as the stream has finished (assuming postgresql can keep up
> with that). In other words, the query would be made even before the
> stream has "started", and would last at least as long as the stream.
>
> (of course, I don't need data from the stream to be saved in any way,
> that's why I don't need to store it in any table).
>
> Does postgresql read data from a function returning a SETOF row by row?
> Or it waits the whole function to be finished (caching the whole resultset)
> before starting to use the returned data? If it reads row by row I think it
> could work...

In pl/pgsql at any rate, functions which return a set of rows build up
the entire result set in memory and then return the set in one go:

http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html#PLPGSQL-STATEMENTS-RETURNING

...see the note at the end of this section.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: Postgresql for a CEP app

От
Leonardo Francalanci
Дата:
> In pl/pgsql at any rate, functions which return  a set of rows build up
> the entire result set in memory and then return the  set in one go:


Ok, then pl/pgsql and pl/python (which can't return SETOF) are ruled out.
(Thank you for pointing that out).
But pl/perl seems to do the trick:

"PL/Perl functions can also return sets of either scalar or composite types.
Usually you'll want to return rows one at a time, both to speed up startup
time and to keep from queueing up the entire result set in memory"


Any other issues anyone can think about?