How to avoid "Out of memory" using aggregate functions?

Поиск
Список
Период
Сортировка
От Mark Dalphin
Тема How to avoid "Out of memory" using aggregate functions?
Дата
Msg-id 3846C8D3.DB066E78@amgen.com
обсуждение исходный текст
Ответы Re: [SQL] How to avoid "Out of memory" using aggregate functions?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Hi,

I am working with a simple table similar to the following:

CREATE TABLE Seq (   SeqID int8 PRIMARY KEY,   FileID int2 NOT NULL,   FileOffset int8 NOT NULL,   SeqLength int4,
FOREIGNKEY (FileID) REFERENCES File (FileID)
 
);

CREATE INDEX Seq_SeqLength ON Seq (SeqLength);

The table contains facts about a DNA sequence (eg its length) and how to access
that sequence from an ordinary Unix file (FileID and FileOffset) where it is
concatonated with all the other DNA sequences.

Currently the TABLE Seq contains 4,586,379 (times 2) rows and the overall size
of the Unix file is 3,660,117,293 bytes (ie I expect sum(SeqLength) to be in the
region of 3.6 Gbytes).

I want to use the PostgreSQL aggregate functions to learn more about the
sequences.

The following query works well and quickly (I need to cast as int8 to avoid
overflow of the sum() and perhaps for avg()):
SELECT    count(seqlength::int8),         min(seqlength::int8),         max(seqlength::int8),
avg(seqlength::int8),        sum(seqlength::int8)
 
FROM Seq
WHERE FileID=4 AND SeqLength>1000;

An EXPLAIN of the above query returns the following:
NOTICE:  QUERY PLAN:

Aggregate  (cost=202000.62 rows=918779 width=4) ->  Index Scan using seq_seqlength on seq  (cost=202000.62 rows=918779
width=4)

EXPLAIN


If I remove the SeqLength constraint, the query takes forever to execute and
then crashes with an out-of-memory (on my SGI, I have 256 Mbyte RAM and equal
sized swap space; I run out of swap space during the query which makes a very
unpleasent noise).

EXPLAIN
SELECT  count(seqlength::int8),       min(seqlength::int8),       max(seqlength::int8),       avg(seqlength::int8),
 sum(seqlength::int8)
 
FROM Seq
WHERE FileID=4;

NOTICE:  QUERY PLAN:

Aggregate  (cost=397916.69 rows=2756337 width=4) ->  Seq Scan on seq  (cost=397916.69 rows=2756337 width=4)

EXPLAIN

I do not understand how to interpret this Query Plan, except that I am doing a
sequential scan of the DB (hence slow), but, since all the elements requested
are aggregates, why do I run out of memory? I am not trying to load the whole
table into memory at once, am I?  Suggestions for how to make this query run
would be welcome; my database is rapidly growing larger!

Thanks,
Mark

--
Mark Dalphin                          email: mdalphin@amgen.com
Mail Stop: 29-2-A                     phone: +1-805-447-4951 (work)
One Amgen Center Drive                       +1-805-375-0680 (home)
Thousand Oaks, CA 91320                 fax: +1-805-499-9955 (work)





В списке pgsql-sql по дате отправления:

Предыдущее
От: Chris Griffin
Дата:
Сообщение: Logical and for int4?
Следующее
От: "Zot O'Connor"
Дата:
Сообщение: PHP/Postgres dieing in random but consistent place.