Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

Поиск

Список

Период

Сортировка

От	Robert Bedell
Тема	Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Дата	18 декабря 2003 г. 00:09:33
Msg-id	200312172308104.SM00984@xavier обсуждение исходный текст
Ответ на	Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Список	pgsql-hackers

Дерево обсуждения

> Hannu Krosing <hannu@tm.ee> writes:
> > OTOH, I'm not sure if hash aggregates can already spill to disk if not
> > enough memory is available for keeping them all.
> 
> They do not, which is something it'd be good to fix, since if the
> planner drastically underestimates the number of groups, you could end
> up with a hashtable far bigger than the advertised SortMem.  Even if
> this doesn't end up with an "out of memory" error, it could still drive
> you into swap hell.
> 
> (Maybe that should be Robert's get-your-feet-wet project.)

Ok, I've been looking through the executor - nodeGroup.c, nodeAgg.c,
dynahash.c, execGrouping.c, etc..  I've found the places where hashed
aggregation occurs.  It looks like hashed groups simply read the entire
result in memory via a function chain through lookup_hash_entry() ->
LookupTupleHashEntry() -> hash_search().  I think that LookupTupleHashEntry
is the best place to put the code to spill over unto disk, since that is
used only by the Group, Agg, and Subplan executor nodes.  Putting it there
gives the added benefit of letting hashed subselect results spill over unto
disk as well (not sure if that's the right thing to do).  I have a couple of
questions:

1) When does the optimizer set the nodeAgg plan to HASHED?  The current
implementation simply reads through the entire result before returning
anything, so obviously it's not always done.  Sorry if I should RTFM on this
one....

2) What mechanism would be best to use for storing the data on disk?  I know
there is a temporary table mechanism, I'll be hunting for that shortly..

3) What should define the spillover point.  The documentation points to the
'sort_mem' parameter for this, but the code doesn't look to actually
implement that yet.  Similarly for the hash_search() function - I didn't see
anything to store it to disk.

4) Should LookupTupleHashEntry() be worried about the pointers it
receives...similarly for hash_search()?

Obviously (2) is the really big question.  What's the best way to do this?
I still have a bit more to go before I understand what's going on, but I'm
starting to grasp it.  Any tips would be appreciated! :)

Cheers!

Robert

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets