BUG #4462: Adding COUNT to query causes massive slowdown

Поиск
Список
Период
Сортировка
От Jussi Pakkanen
Тема BUG #4462: Adding COUNT to query causes massive slowdown
Дата
Msg-id 200810090914.m999EjOJ073923@wwwmaster.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #4462: Adding COUNT to query causes massive slowdown  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-bugs
The following bug has been logged online:

Bug reference:      4462
Logged by:          Jussi Pakkanen
Email address:      jpakkane@gmail.com
PostgreSQL version: 8.3.3
Operating system:   Ubuntu x86 8/04
Description:        Adding COUNT to query causes massive slowdown
Details:

I have a table in the following format

code CHARACTER(9) NOT NULL
text VARCHAR(200)

I have built an INDEX on "code", VACUUMed and ANALYZEd the table.

I have about 32 million rows and roughly 200 000 unique "code" elements.

I determine the unique codes using the following SQL query:



EXPLAIN SELECT DISTINCT code FROM log;
                             QUERY PLAN
----------------------------------------------------------------------------
-----------
 Unique  (cost=0.00..1384173.89 rows=6393 width=10)
   ->  Index Scan using codeindex on log  (cost=0.00..1303930.83
rows=32097224 width=10)
(2 rows)

This takes about 4 minutes (it's a slow machine) but pretty much works as
expected.

However when I try to count the amount of distinct codes, I get this:


EXPLAIN SELECT COUNT(DISTINCT code) FROM log;
                                   QUERY PLAN
----------------------------------------------------------------------------
-----
 Aggregate  (cost=100801488.30..100801488.31 rows=1 width=10)
   ->  Seq Scan on log  (cost=100000000.00..100721245.24 rows=32097224
width=10)
(2 rows)


For some reason PostgreSQL wants to do a full table scan in this case. This
takes over 11 minutes.

Transferring the result set from the first query to a Python client program
and calculating the lines there takes about 4 seconds. This makes pg over
100 times slower than the naive implementation.

If I do the same COUNT using a view, it uses the index and is fast:

CREATE VIEW distcode AS SELECT DISTINCT code FROM log;

EXPLAIN SELECT COUNT(*) FROM distcode;
                                         QUERY PLAN

----------------------------------------------------------------------------
-----------------
 Aggregate  (cost=1384253.81..1384253.82 rows=1 width=0)
   ->  Unique  (cost=0.00..1384173.89 rows=6393 width=10)
         ->  Index Scan using codeindex on log  (cost=0.00..1303930.83
rows=320972

I tried setting seq_scan to off. It did not help.

Due to reasons beyond my control, I can't test version 8.3.4 until the next
Ubuntu is released (at the end of this month).

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Dmitry Orlov"
Дата:
Сообщение: BUG #4461: Should avoid create tblspcs in system location
Следующее
От: ivocs admin
Дата:
Сообщение: Locale (unsupported) bug. uk_UA.KOI8-U