Re: Large Scale Aggregation (HashAgg Enhancement)

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Large Scale Aggregation (HashAgg Enhancement)
Дата
Msg-id 1137438176.3180.160.camel@localhost.localdomain
обсуждение исходный текст
Ответ на Re: Large Scale Aggregation (HashAgg Enhancement)  (Rod Taylor <pg@rbt.ca>)
Список pgsql-hackers
On Mon, 2006-01-16 at 09:42 -0500, Rod Taylor wrote:
> On Mon, 2006-01-16 at 08:32 +0000, Simon Riggs wrote:
> > On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> > > 
> > A question: Are the rows in your 3 B row table clumped together based
> > upon the 100M row key? (or *mostly* so) We might also be able to
> 
> They are randomly distributed. Fully sorting the data is quite painful.

...

> I don't understand how this helps. 

It wouldn't since your rows are randomly distributed. The idea was not
related to improving HashAgg, but to improving Aggregation for the case
of naturally grouped data.

> I think I need something closer to:
> 
> HashAgg
>     -> HashSort (to disk)
> 
> HashSort would create a number of files on disk with "similar" data.
> Grouping all similar keys into a single temporary file which HashAgg can
> deal with individually (100 loops by 1M target keys instead of 1 loop by
> 100M target keys). The results would be the same as partitioning by
> keyblock and running a HashAgg on each partition, but it would be
> handled by the Executor rather than by client side code.
> 
> > > I've written something similar using a client and COPY with temporary
> > > tables. Even with the Export/Import copy I still beat the Sort&Sum
> > > method PostgreSQL falls back to.

That is exactly how the spill to disk logic works for HashJoin (and
incidentally, identical to an Oracle one-pass hash join since both are
based upon the hybrid hash join algorithm). 

Multi-pass would only be required to handle very skewed hash
distributions, which HJ doesn't do yet.

So yes, this can be done. 

Best Regards, Simon Riggs



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: Surrogate keys (Was: enums)
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Large Scale Aggregation (HashAgg Enhancement)