Huge Data sets, simple queries

От: Mike Biamonte
Тема: Huge Data sets, simple queries
Дата: ,
Msg-id: 007101c623a9$82207c10$0200a8c0@videobox
(см: обсуждение, исходный текст)
Ответы: Re: Huge Data sets, simple queries  ("Jeffrey W. Baker")
Re: Huge Data sets, simple queries  (Tom Lane)
Re: Huge Data sets, simple queries  ("Luke Lonergan")
Re: Huge Data sets, simple queries  (Michael Adler)
Re: Huge Data sets, simple queries  ("Craig A. James")
Список: pgsql-performance

Скрыть дерево обсуждения

Huge Data sets, simple queries  ("Mike Biamonte", )
 Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
 Re: Huge Data sets, simple queries  (Tom Lane, )
  Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
   Re: Huge Data sets, simple queries  (Tom Lane, )
    Re: Huge Data sets, simple queries  (Tom Lane, )
 Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  (hubert depesz lubaczewski, )
   Re: Huge Data sets, simple queries  (Michael Stone, )
  Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
    Re: Huge Data sets, simple queries  (Kevin, )
    Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
     Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
        Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
    Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
     Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  (PFC, )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Steinar H. Gunderson", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
        Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
         Re: Huge Data sets, simple queries  (PFC, )
          Re: Huge Data sets, simple queries  ("Luke Lonergan", )
           Re: Huge Data sets, simple queries  ("Steinar H. Gunderson", )
           Re: Huge Data sets, simple queries  (Mike Rylander, )
         Re: Huge Data sets, simple queries  ("Luke Lonergan", )
       Re: Huge Data sets, simple queries  (Michael Stone, )
     Re: Huge Data sets, simple queries  (Alan Stange, )
 Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
  Re: Huge Data sets, simple queries  (Charles Sprickman, )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  (hubert depesz lubaczewski, )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
 Re: Huge Data sets, simple queries  (Michael Adler, )
 Re: Huge Data sets, simple queries  ("Craig A. James", )


Does anyone have any experience with extremely large data sets?
I'm mean hundreds of millions of rows.

The queries I need to run on my 200 million transactions are relatively
simple:

   select month, count(distinct(cardnum)) count(*), sum(amount) from
transactions group by month;

This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4 Kernel) with
RAID-10 (15K drives)
and 12 GB Ram.  I was expecting it to take about 4 hours - based on some
experience with a
similar dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM,
Raid-5 10K drives)

  This machine is COMPLETELY devoted to running these relatively simple
queries one at a
time. (No multi-user support needed!)    I've been tooling with the various
performance settings:
effective_cache at 5GB, shared_buffers at 2 GB, workmem, sortmem at 1 GB
each.
( Shared buffers puzzles me a it bit - my instinct says to set it as high as
possible,
but everything I read says that "too high" can hurt performance.)

   Any ideas for performance tweaking in this kind of application would be
greatly appreciated.
We've got indexes on the fields being grouped, and always vacuum analzye
after building them.

   It's difficult to just "try" various ideas because each attempt takes a
full day to test.  Real
experience is needed here!

Thanks much,

Mike



В списке pgsql-performance по дате сообщения:

От: "Jeffrey W. Baker"
Дата:
Сообщение: Re: Huge Data sets, simple queries
От: Tom Lane
Дата:
Сообщение: Re: Huge Data sets, simple queries