Re: Strategy for doing number-crunching

Поиск
Список
Период
Сортировка
От Matthew Foster
Тема Re: Strategy for doing number-crunching
Дата
Msg-id CAP1ZYZFs59c+zM5XL_s=AQedjyDn18C0BLULmCtfarY+E6Yqpw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Strategy for doing number-crunching  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-novice


On Wed, Jan 4, 2012 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matthew Foster <matthew.foster@noaa.gov> writes:
> On Wed, Jan 4, 2012 at 10:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Matthew Foster <matthew.foster@noaa.gov> writes:
>>> We have a database with approximately 130M rows, and we need to produce
>>> statistics (e.g. mean, standard deviation, etc.) on the data.  Right now,
>>> we're generating these stats via a single SELECT, and it is extremely
>>> slow...like it can take hours to return results.

>> What datatype are the columns being averaged?  If "numeric", consider
>> casting to float8 before applying the aggregates.  You'll lose some
>> precision but it'll likely be orders of magnitude faster.

> The data are type double.

Hmm.  In that case I think you have some other problem that's hidden in
details you didn't show us.  It should not take "hours" to process only
130M rows.  This would best be taken up on pgsql-performance; please see
http://wiki.postgresql.org/wiki/Slow_Query_Questions

                       regards, tom lane

Tom,

I think you are absolutely right.  Some additional testing, with the arithmetic removed from the queries, still shows very slow performance.

I'll do some more digging, and perhaps take this to the performance list.  Thanks for your advice!

Matt

В списке pgsql-novice по дате отправления:

Предыдущее
От: JORGE MALDONADO
Дата:
Сообщение: Duplicate information in parent and child tables
Следующее
От: Christian Tonhäuser
Дата:
Сообщение: Too much RAM allocated by webserver when executing an Insert-Statement (npgsql)