Обсуждение: Performance tradeoff

Поиск

Список

Период

Сортировка

Performance tradeoff

От

"Shawn Chisholm"

Дата:

02 марта 2005 г., 20:52:19

Hi All,

I am wondering about the relative performance of  "insert into table1 select distinct a,b from ..." and "insert into
table1select a,b from ... group by a,b" when querying tables of different sizes (10K, 100K, 1s, 10s, 100s of millions
ofrows).   

The distinct way tends to sort/unique and the group by tends to hash aggregate... any opinions on which is better?

I can also change the schema to a certain extent, so would it be worthwhile to put indices on the queried tables (or
refactorthem) hoping the distinct does an index scan instead of sort...  would the query planner take advantage of
that?

Thanks,

Shawn

Re: Performance tradeoff

От

Josh Berkus

Дата:

03 марта 2005 г., 08:32:00

Shawn,

> I can also change the schema to a certain extent, so would it be worthwhile
> to put indices on the queried tables (or refactor them) hoping the distinct
> does an index scan instead of sort...  would the query planner take
> advantage of that?

Use the GROUP BY, with an index on the grouped columns and lots of work_mem
(sort_mem in 7.4).   This will give the planner the option of a hashaggregate
which could be significantly faster than the other methods.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Performance tradeoff

Performance tradeoff

Re: Performance tradeoff