Re: huge price database question..

Поиск
Список
Период
Сортировка
От Andy Colson
Тема Re: huge price database question..
Дата
Msg-id 4F6A085E.7060800@squeakycode.net
обсуждение исходный текст
Ответ на Re: huge price database question..  (Lee Hachadoorian <Lee.Hachadoorian+L@gmail.com>)
Ответы Re: huge price database question..  (Lee Hachadoorian <Lee.Hachadoorian+L@gmail.com>)
Список pgsql-general
On 3/21/2012 11:45 AM, Lee Hachadoorian wrote:
>
>
> On Tue, Mar 20, 2012 at 11:28 PM, Jim Green
> <student.northwestern@gmail.com <mailto:student.northwestern@gmail.com>>
> wrote:
>
>     On 20 March 2012 22:57, John R Pierce <pierce@hogranch.com
>     <mailto:pierce@hogranch.com>> wrote:
>
>      > avg() in the database is going to be a lot faster than copying
>     the data into
>      > memory for an application to process.
>
>     I see..
>
>
> As an example, I ran average on a 700,000 row table with 231 census
> variables reported by state. Running average on all 231 columns grouping
> by state inside Postgres beat running it by R by a factor of 130 NOT
> COUNTING an additional minute or so to pull the table from Postgres to
> R. To be fair, these numbers are not strictly comparable, because it's
> running on different hardware. But the setup is not atypical: Postgres
> is running on a heavy hitting server while R is running on my desktop.
>
> SELECT state, avg(col1), avg(col2), [...] avg(col231)
> FROM some_table
> GROUP BY state;
>
> 5741 ms
>
> aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm =
> TRUE)
>
> 754746 ms
>
> --Lee

avg() might not be a good example though.  If you just want average,
great, but what about when you want to do lots of different stats'y
functions that PG does not have?  I'll bet R (not pl-R) can read a text
file (a csv probably) pretty fast.

If you were doing a really complex analysis I bet dumping a dataset out
to csv first and then running R scripts over it would be fast and useful.

 > aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm =
 > TRUE)

Is this a Postgres stored proc (pl-R)?  Or is that R itself?  If its plR
then I wonder if its stepping through the recordset twice.  Depending on
how the function is written, I can see the function firing off a query,
PG collects the entire recordset, and hands it back to the function,
which then iterates thru it again.  (vs. having the function called for
each row as PG steps thru the recordset only once).

-Andy

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: huge price database question..
Следующее
От: Steve Crawford
Дата:
Сообщение: Re: huge price database question..