Re: HashAggregate slower than sort?

Поиск

Список

Период

Сортировка

От	Jatinder Sangha
Тема	Re: HashAggregate slower than sort?
Дата	21 июня 2010 г. 09:59:23
Msg-id	519DE98D62C54D4A896E22A026CF1322AEE191@cdlon-ex.CoalitionDev.local обсуждение исходный текст
Ответ на	HashAggregate slower than sort? ("Jatinder Sangha" <js@coalition.com>)
Ответы	Re: HashAggregate slower than sort?
Список	pgsql-performance

Дерево обсуждения

Hi Kevin,

Thanks for the suggestions.

I've already converted all of my SQL to use "distinct on (...)" and this
is now always faster using the hash-aggregates than when using sorting.
The queries now only use sorting if the hashing would take up too much
memory.

Thanks,
--Jatinder

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: 18 June 2010 18:59
To: Jatinder Sangha; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] HashAggregate slower than sort?

"Jatinder Sangha" <js@coalition.com> wrote:

> I have a simple query that when planned either uses hash- aggregates
> or a sort depending on the amount of working memory available. The
> problem is that when it uses the hash-aggregates, the query runs 25%
> slower than when using the sort method.
>
> The table in question contains about 60 columns, many of which are
> boolean, 32-bit integers and some are 64-bit integers. Many fields are

> text - and some of these can be quite long (eg 32Kb).

> Obviously, I can re-write the query to use a "distinct on (...)"
> clause

Yeah, that seems prudent, to say the least.

> Why is the hash-aggregate slower than the sort?
>
> Is it something to do with the number of columns? ie. When sorting,
> the first few columns defined on the table (id, version) make the row
> unique - but when using the hash-aggregate feature, presumably every
> column needs to be hashed which takes longer especially for long text
> fields?

Sounds like a reasonable guess to me.  But since you're apparently
retrieving about 9,000 wide rows in (worst case) 56 ms, it would seem
that your active data set may be fully cached.  If so, you could try
reducing both random_page_cost and seq_page_cost to something in the 0.1
to 0.005 range and see if it improves the accuracy of the cost
estimates.  Not that you should go back to using DISTINCT on all 60
column, including big text columns; but these cost factors might help
other queries pick faster plans.

-Kevin

Coalition Development Ltd 1st  Floor, One Newhams Row, London, United Kingdom, SE1 3UZ
Registration Number - 04328897 Registered Office - Direct Control 3rd Floor, Marvic House, Bishops Road, London, United
Kingdom,SW6 7AD

В списке pgsql-performance по дате отправления:

Предыдущее

От: Scott Marlowe
Дата: 20 июня 2010 г., 20:31:22
Сообщение: Re: Obtaining the exact size of the database.

Следующее

От: "Kevin Grittner"
Дата: 21 июня 2010 г., 13:54:02
Сообщение: Re: HashAggregate slower than sort?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: HashAggregate slower than sort?

Предыдущее

Следующее