Slow query with a lot of data

Поиск
Список
Период
Сортировка
От Moritz Onken
Тема Slow query with a lot of data
Дата
Msg-id 90AA1B69-4E9C-4710-9D58-F65620DDA022@houseofdesign.de
обсуждение исходный текст
Ответы Re: Slow query with a lot of data  (Matthew Wakeling <matthew@flymine.org>)
Re: Slow query with a lot of data  (Moritz Onken <onken@houseofdesign.de>)
Список pgsql-performance
Hi,

I run this query:

select max(a."user"), b.category, count(1) from result a,
domain_categories b where a."domain" = b."domain" group by b.category;

the table result contains all websites a user visited. And the table
domain_categories contains all categories a domain is in.
result has 20 Mio rows and domain_categories has about 12 Mio. There
are 500.000 different users.

I have indexes on result.domain, domain_categories.domain,
result.user, domain_categories.category. Clustered result on user and
domain_categories on domain.

explain analyze says (limited to one user with id 1337):

"HashAggregate  (cost=2441577.16..2441614.72 rows=2504 width=8)
(actual time=94667.335..94671.508 rows=3361 loops=1)"
"  ->  Merge Join  (cost=2119158.02..2334105.00 rows=14329622 width=8)
(actual time=63559.938..94621.557 rows=36308 loops=1)"
"        Merge Cond: (a.domain = b.domain)"
"        ->  Sort  (cost=395.52..405.49 rows=3985 width=8) (actual
time=0.189..0.211 rows=19 loops=1)"
"              Sort Key: a.domain"
"              Sort Method:  quicksort  Memory: 27kB"
"              ->  Index Scan using result_user_idx on result a
(cost=0.00..157.21 rows=3985 width=8) (actual time=0.027..0.108
rows=61 loops=1)"
"                    Index Cond: ("user" = 1337)"
"        ->  Materialize  (cost=2118752.28..2270064.64 rows=12104989
width=8) (actual time=46460.599..82336.116 rows=12123161 loops=1)"
"              ->  Sort  (cost=2118752.28..2149014.75 rows=12104989
width=8) (actual time=46460.592..59595.851 rows=12104989 loops=1)"
"                    Sort Key: b.domain"
"                    Sort Method:  external sort  Disk: 283992kB"
"                    ->  Seq Scan on domain_categories b
(cost=0.00..198151.89 rows=12104989 width=8) (actual
time=14.352..22572.869 rows=12104989 loops=1)"
"Total runtime: 94817.058 ms"

This is running on a pretty small server with 1gb of ram and a slow
sata hd. Shared_buffers is 312mb, max_fsm_pages = 153600. Everything
else is commented out. Postgresql v8.3.3. Operating system Ubuntu 8.04.

It would be great if someone could help improve this query. This is
for a research project at my university.

Thanks in advance,

Moritz


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Cross Join Problem
Следующее
От: Matthew Wakeling
Дата:
Сообщение: Re: Slow query with a lot of data