Re: Serious performance problem

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Serious performance problem
Дата
Msg-id 3BDFD17C.C78B35E@tm.ee
обсуждение исходный текст
Ответ на Serious performance problem  ("Tille, Andreas" <TilleA@rki.de>)
Ответы Re: Serious performance problem  ("Tille, Andreas" <TilleA@rki.de>)
Список pgsql-hackers
"Tille, Andreas" wrote:
> 
> Hello,
> 
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list.  The thread starts with
> message
> 
>     http://fts.postgresql.org/db/mw/msg.html?mid=1035557
> 
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2.  But sorry this is really not enough.  The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff.  I´ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively).  To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute.  So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in.  So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
> 
>    SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE
(((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
> 
> to the data set I put on
> 
>    http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
> 
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.


I tried some more on optimizing the query on my work computer 
(AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)


SELECT MeldeKategorie,       Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20
GROUP BY MeldeKategorie 
ORDER BY MeldeKategorie;

real    0m9.675s

create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);

----------------------------
set enable_seqscan = off;
SELECT MeldeKategorie,       Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20
GROUP BY MeldeKategorie 
ORDER BY MeldeKategorie;

Aggregate  (cost=4497.30..4510.18 rows=258 width=16) ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)       ->
Sort (cost=4497.30..4497.30 rows=2575 width=16)             ->  Index Scan using i1 on hauptdaten_fall 
 
(cost=0.00..4351.40 rows=2575 width=16)

real    0m7.131s

---------------------------

set enable_seqscan = off;
SELECT MeldeKategorie,       Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=4497.30..4510.18 rows=258 width=16) ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)       ->
IndexScan using i1 on hauptdaten_fall  (cost=0.00..4351.40
 
rows=2575 width=16)

real    0m3.223s

-- same after doing

cluster i1 on Hauptdaten_Fall;

real    1.590 -- 1.600



select count(*) from Hauptdaten_Fall;

real    0m0.630s

---------------------------

The following query is marginally (about 0.1 sec) faster, though the 
plan looks the same down to cost estimates.

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,       Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=0.00..4370.72 rows=258 width=16) ->  Group  (cost=0.00..4364.28 rows=2575 width=16)       ->  Index
Scanusing i1 on hauptdaten_fall  (cost=0.00..4351.40
 
rows=2575 width=16)

real    0m1.438s - 1.506s

---------------------------

now I make the dataset bigger keeping the number of rows returned by
query the same

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 20, meldekategorie 
from  hauptdaten_fall ;

INSERT 0 257530

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 40, meldekategorie 
from  hauptdaten_fall ;

INSERT 0 515060
ifsgtest=# select count(*) from hauptdaten_fall; count  
---------1030120
(1 row)

cluster i1 on Hauptdaten_Fall;
vacuum analyze;


-- The query time is still the same 1.44 - 1.5 sec

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,       Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=0.00..4370.72 rows=258 width=16) ->  Group  (cost=0.00..4364.28 rows=2575 width=16)       ->  Index
Scanusing i1 on hauptdaten_fall  (cost=0.00..4351.40
 
rows=2575 width=16)

real    0m1.438s - 1.506s

----------------------------

now back to original data distribution, just 4 times bigger

ifsgtest=# update hauptdaten_fall
ifsgtest-# set istaktuell = case when istaktuell % 20 = 0 then 20 else
10 end
ifsgtest-# ;
UPDATE 1030120
ifsgtest=# vacuum analyze;
VACUUM

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,       Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

real    0m6.077 -- 6.606s

and after clustering:
cluster i1 on Hauptdaten_Fall;

real    0m5.683 - 5.750s

so it's linear growth here

----------------------------

Hannu


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Marc G. Fournier"
Дата:
Сообщение: Re: pgsql-committers?
Следующее
От: mlw
Дата:
Сообщение: Re: Serious performance problem