BUG #4462: Adding COUNT to query causes massive slowdown
От | Jussi Pakkanen |
---|---|
Тема | BUG #4462: Adding COUNT to query causes massive slowdown |
Дата | |
Msg-id | 200810090914.m999EjOJ073923@wwwmaster.postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #4462: Adding COUNT to query causes massive slowdown
|
Список | pgsql-bugs |
The following bug has been logged online: Bug reference: 4462 Logged by: Jussi Pakkanen Email address: jpakkane@gmail.com PostgreSQL version: 8.3.3 Operating system: Ubuntu x86 8/04 Description: Adding COUNT to query causes massive slowdown Details: I have a table in the following format code CHARACTER(9) NOT NULL text VARCHAR(200) I have built an INDEX on "code", VACUUMed and ANALYZEd the table. I have about 32 million rows and roughly 200 000 unique "code" elements. I determine the unique codes using the following SQL query: EXPLAIN SELECT DISTINCT code FROM log; QUERY PLAN ---------------------------------------------------------------------------- ----------- Unique (cost=0.00..1384173.89 rows=6393 width=10) -> Index Scan using codeindex on log (cost=0.00..1303930.83 rows=32097224 width=10) (2 rows) This takes about 4 minutes (it's a slow machine) but pretty much works as expected. However when I try to count the amount of distinct codes, I get this: EXPLAIN SELECT COUNT(DISTINCT code) FROM log; QUERY PLAN ---------------------------------------------------------------------------- ----- Aggregate (cost=100801488.30..100801488.31 rows=1 width=10) -> Seq Scan on log (cost=100000000.00..100721245.24 rows=32097224 width=10) (2 rows) For some reason PostgreSQL wants to do a full table scan in this case. This takes over 11 minutes. Transferring the result set from the first query to a Python client program and calculating the lines there takes about 4 seconds. This makes pg over 100 times slower than the naive implementation. If I do the same COUNT using a view, it uses the index and is fast: CREATE VIEW distcode AS SELECT DISTINCT code FROM log; EXPLAIN SELECT COUNT(*) FROM distcode; QUERY PLAN ---------------------------------------------------------------------------- ----------------- Aggregate (cost=1384253.81..1384253.82 rows=1 width=0) -> Unique (cost=0.00..1384173.89 rows=6393 width=10) -> Index Scan using codeindex on log (cost=0.00..1303930.83 rows=320972 I tried setting seq_scan to off. It did not help. Due to reasons beyond my control, I can't test version 8.3.4 until the next Ubuntu is released (at the end of this month).
В списке pgsql-bugs по дате отправления: