Re: How slow is DISTINCT?

Поиск
Список
Период
Сортировка
От Michael Contzen
Тема Re: How slow is DISTINCT?
Дата
Msg-id 3CB2BF53.77D41E79@dohle.com
обсуждение исходный текст
Ответ на How slow is DISTINCT?  (Wei Weng <wweng@kencast.com>)
Ответы Re: How slow is DISTINCT?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: How slow is DISTINCT?  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Список pgsql-sql
Hello,

our problem to 'select distinct' is similar: We have a big table of
about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
works fine! 

Createing an index on it quite fast (nearly as fast than oracle on the
same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes. 

Then we tried 'select distinct one_field' which would result to about
200.000 different values. Postgres needed 6 hours while Oracle managed
it in about 30 minutes.

Looking into the pgsql_tmp directory of this db while doing this
selection showed me a lot of tempfiles nearly as big as the table.

Does postgres sort the whole table without projection to one column an
performs a unique on this whole table?
This would explain the big amount of disk usage in pgsql_tmp and the big
amount of time.

The statement, something could be wrong with the data, is not very
useful: This is data of our electronic cash-desks. Unfortunately our
customers buy every day nearly the same articles - therefor the
repeatition of data :-)


Kind regards

M.Contzen
Developer 
Dohle Systemberatung
Germany


Some facts of our test:
        Table "warenausg_ges" Column   |     Type      | Modifiers 
-----------+---------------+-----------ean       | numeric(13,0) | menge     | numeric(13,3) | lvkumsatz |
numeric(15,3)| vkumsatz  | numeric(15,3) | ekumsatz  | numeric(15,3) | rabatt    | numeric(12,0) | kdnr      |
numeric(10,0)| artnr     | numeric(10,0) | lfnr      | numeric(10,0) | wg        | integer       | aktion    |
character(1) | datum     | date          | status    | integer       | 
 
Indexes: warenausg_ges_inx

Index "warenausg_ges_inx"Column | Type 
--------+------datum  | date
btree

explain select distinct artnr from warenausg_ges;
NOTICE:  QUERY PLAN:

Unique  (cost=224522801.22..225315849.86 rows=31721946 width=12) ->  Sort  (cost=224522801.22..224522801.22
rows=317219456width=12)       ->  Seq Scan on warenausg_ges  (cost=0.00..165793667.00
 
rows=317219456 width=12)


В списке pgsql-sql по дате отправления:

Предыдущее
От: "Dorian Taylor"
Дата:
Сообщение: case statement in group by clause
Следующее
От: "Pierre-Andre Michel"
Дата:
Сообщение: How to use an index on a bigint column