Re: Serious performance problem
От | Hannu Krosing |
---|---|
Тема | Re: Serious performance problem |
Дата | |
Msg-id | 3BDFD17C.C78B35E@tm.ee обсуждение исходный текст |
Ответ на | Serious performance problem ("Tille, Andreas" <TilleA@rki.de>) |
Ответы |
Re: Serious performance problem
("Tille, Andreas" <TilleA@rki.de>)
|
Список | pgsql-hackers |
"Tille, Andreas" wrote: > > Hello, > > I discussed a problem concerning the speed of PostgreSQL compared to > MS SQL server heavily on postgres-general list. The thread starts with > message > > http://fts.postgresql.org/db/mw/msg.html?mid=1035557 > > Now I tried a snapshot of version 7.2 and got an increase of speed of > about factor 2. But sorry this is really not enough. The very simple > test I pointed to in my mail is even much to slow and the issue would > probably spoil down the whole project which should be a complete open > source solution and would perhaps and in any M$ stuff. I´ve got under > heavy preasur from my employer who was talking about the nice world > of MS .net (while he is using MS-SQL exclusively). To make the thing > clear the issue is the gal database of infectious diseases in Germany > runned by the Robert Koch-Institute. So the beast could be of some > importance for increasing the acceptance of PostgreSQL and Open Source > in the field of medicine which is generally known for the money which > is involved in. So I really hope that some skilled programmers would > be able to find a good way to solve the performance issue perhaps by > just profiling the simple query > > SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) > GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; > > to the data set I put on > > http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2 > > If this should take less than half a second on a modern PC I could > continue to try mo realistic queries. I tried some more on optimizing the query on my work computer (AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings) SELECT MeldeKategorie, Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20 GROUP BY MeldeKategorie ORDER BY MeldeKategorie; real 0m9.675s create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie); ---------------------------- set enable_seqscan = off; SELECT MeldeKategorie, Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20 GROUP BY MeldeKategorie ORDER BY MeldeKategorie; Aggregate (cost=4497.30..4510.18 rows=258 width=16) -> Group (cost=4497.30..4503.74 rows=2575 width=16) -> Sort (cost=4497.30..4497.30 rows=2575 width=16) -> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40 rows=2575 width=16) real 0m7.131s --------------------------- set enable_seqscan = off; SELECT MeldeKategorie, Count(ID) AS Anz FROM Hauptdaten_Fall WHERE IstAktuell=20 GROUP BY IstAktuell,MeldeKategorie ORDER BY IstAktuell,MeldeKategorie; Aggregate (cost=4497.30..4510.18 rows=258 width=16) -> Group (cost=4497.30..4503.74 rows=2575 width=16) -> IndexScan using i1 on hauptdaten_fall (cost=0.00..4351.40 rows=2575 width=16) real 0m3.223s -- same after doing cluster i1 on Hauptdaten_Fall; real 1.590 -- 1.600 select count(*) from Hauptdaten_Fall; real 0m0.630s --------------------------- The following query is marginally (about 0.1 sec) faster, though the plan looks the same down to cost estimates. SET ENABLE_SEQSCAN = OFF; SELECT MeldeKategorie, Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where IstAktuell=20) sub GROUP BY IstAktuell,MeldeKategorie ORDER BY IstAktuell,MeldeKategorie; Aggregate (cost=0.00..4370.72 rows=258 width=16) -> Group (cost=0.00..4364.28 rows=2575 width=16) -> Index Scanusing i1 on hauptdaten_fall (cost=0.00..4351.40 rows=2575 width=16) real 0m1.438s - 1.506s --------------------------- now I make the dataset bigger keeping the number of rows returned by query the same insert into hauptdaten_fall (istaktuell, meldekategorie) select istaktuell + 20, meldekategorie from hauptdaten_fall ; INSERT 0 257530 insert into hauptdaten_fall (istaktuell, meldekategorie) select istaktuell + 40, meldekategorie from hauptdaten_fall ; INSERT 0 515060 ifsgtest=# select count(*) from hauptdaten_fall; count ---------1030120 (1 row) cluster i1 on Hauptdaten_Fall; vacuum analyze; -- The query time is still the same 1.44 - 1.5 sec SET ENABLE_SEQSCAN = OFF; SELECT MeldeKategorie, Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where IstAktuell=20) sub GROUP BY IstAktuell,MeldeKategorie ORDER BY IstAktuell,MeldeKategorie; Aggregate (cost=0.00..4370.72 rows=258 width=16) -> Group (cost=0.00..4364.28 rows=2575 width=16) -> Index Scanusing i1 on hauptdaten_fall (cost=0.00..4351.40 rows=2575 width=16) real 0m1.438s - 1.506s ---------------------------- now back to original data distribution, just 4 times bigger ifsgtest=# update hauptdaten_fall ifsgtest-# set istaktuell = case when istaktuell % 20 = 0 then 20 else 10 end ifsgtest-# ; UPDATE 1030120 ifsgtest=# vacuum analyze; VACUUM SET ENABLE_SEQSCAN = OFF; SELECT MeldeKategorie, Count(*) AS Anz FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where IstAktuell=20) sub GROUP BY IstAktuell,MeldeKategorie ORDER BY IstAktuell,MeldeKategorie; real 0m6.077 -- 6.606s and after clustering: cluster i1 on Hauptdaten_Fall; real 0m5.683 - 5.750s so it's linear growth here ---------------------------- Hannu
В списке pgsql-hackers по дате отправления: