Re: Serious performance problem

Поиск

Список

Период

Сортировка

От	Tille, Andreas
Тема	Re: Serious performance problem
Дата	1 ноября 2001 г. 10:32:05
Msg-id	Pine.LNX.4.33.0111011541480.21752-100000@wr-linux02.rki.ivbb.bund.de обсуждение исходный текст
Ответ на	Re: Serious performance problem (Hannu Krosing <hannu@tm.ee>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, 31 Oct 2001, Hannu Krosing wrote:

> I tried some more on optimizing the query on my work computer
> (AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)
>
>
> SELECT MeldeKategorie,
>        Count(ID) AS Anz
>   FROM Hauptdaten_Fall
>  WHERE IstAktuell=20
> GROUP BY MeldeKategorie
> ORDER BY MeldeKategorie;
>
> real    0m9.675s
>
> create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);
>
> ----------------------------
> set enable_seqscan = off;
> SELECT MeldeKategorie,
>        Count(ID) AS Anz
>   FROM Hauptdaten_Fall
>  WHERE IstAktuell=20
> GROUP BY MeldeKategorie
> ORDER BY MeldeKategorie;
>
> Aggregate  (cost=4497.30..4510.18 rows=258 width=16)
>   ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)
>         ->  Sort  (cost=4497.30..4497.30 rows=2575 width=16)
>               ->  Index Scan using i1 on hauptdaten_fall
> (cost=0.00..4351.40 rows=2575 width=16)
>
> real    0m7.131s
>
> ---------------------------
>
> set enable_seqscan = off;
> SELECT MeldeKategorie,
>        Count(ID) AS Anz
>   FROM Hauptdaten_Fall
>  WHERE IstAktuell=20
> GROUP BY IstAktuell,MeldeKategorie
> ORDER BY IstAktuell,MeldeKategorie;
>
> Aggregate  (cost=4497.30..4510.18 rows=258 width=16)
>   ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)
>         ->  Index Scan using i1 on hauptdaten_fall  (cost=0.00..4351.40
> rows=2575 width=16)
>
> real    0m3.223s
Hmmm, could you please explain the theory behind that for quite a
beginner like me (perhaps on -general if you feel it apropriate)

The change in the second select is that you included IstAktuell in the
GROUP BY/ORDER BY clause and this gives a speed increas by factor 2.
It seems that the  "Sort" can be left out in this case if I look at the
plan, but why that?  The WHERE clause should select just all IstAktuell=20
data sets and so the GROUP BY/ORDER BY clauses should every time have the
same work - as for my humble understanding.

>
> -- same after doing
>
> cluster i1 on Hauptdaten_Fall;
>
> real    1.590 -- 1.600
That´s also interesting.  In reality the table Hauptdaten_Fall has many fields
with many indices.  If I understand things right it makes no sense to have
more than one clustered index, right? A further speed increase of factor two
would be welcome.  Could I expect this if I would find out the "sensitive"
index of my table for certain tasks?  Or is my understanging wrong and it
makes sense to cluster more than one index.  Unfortunately clustering the
index of a huge table takes some time.  Could I speed this up by some
tricks?

> select count(*) from Hauptdaten_Fall;
>
> real    0m0.630s
>
> ---------------------------
>
> The following query is marginally (about 0.1 sec) faster, though the
> plan looks the same down to cost estimates.
>
> SET ENABLE_SEQSCAN = OFF;
> SELECT MeldeKategorie,
>        Count(*) AS Anz
>   FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
> IstAktuell=20) sub
> GROUP BY IstAktuell,MeldeKategorie
> ORDER BY IstAktuell,MeldeKategorie;
>
> Aggregate  (cost=0.00..4370.72 rows=258 width=16)
>   ->  Group  (cost=0.00..4364.28 rows=2575 width=16)
>         ->  Index Scan using i1 on hauptdaten_fall  (cost=0.00..4351.40
> rows=2575 width=16)
>
> real    0m1.438s - 1.506s
Hmm, perhaps this is nearly nothing or is there any theory that a
count(*) is faster than a count(<fieldname>)?

> ...
> real    0m6.077 -- 6.606s
>
> and after clustering:
> cluster i1 on Hauptdaten_Fall;
>
> real    0m5.683 - 5.750s
>
> so it's linear growth here
This is what my colleague was afraid of: We would have linear growth
compared to the log(n) growth which is to be expected on MS SQL server
(for this certain type of queries and for sure up to a far limit of
data where other constraints could get influence, but we are far from
this limit).  This would not convince him :-(.

Kind regards
        Andreas.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Serious performance problem