Re: Solving hash table overrun problems

Поиск

Список

Период

Сортировка

От	Aaron Birkland
Тема	Re: Solving hash table overrun problems
Дата	7 марта 2005 г. 00:46:34
Msg-id	19ab0ccd05030616463919926a@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Solving hash table overrun problems (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

> > This also brings up a line of thought I had a while ago on a related
> > topic.  Something like a "HashDistinct" might be useful, if it had no
> > startup cost.  

> We already have that: the planner will use a HashAgg node in this
> fashion in some contexts (I think just as one of the ways to do IN,
> at the moment).  

Hmm.. I see HashAggregate being used that way in the IN queries, but I
have not observed it used in a way that incurrs no startup cost.  It
looked to me that in doing hash aggregation in ExecAgg (nodeAgg.c),
agg_fill_hash_table() would have to be called, which iterate through
every output of the child plan building the hash table before it
returns, thus incurring at least the startup cost of executing the
entire subplan of the child node at the aggregation stage.  I'm not
too familiar with the code, so there is probably something I'm missing
somewhere :(

> It's not yet bright enough to consider doing it for
> SELECT DISTINCT.  The DISTINCT planning code is old and crufty and
> pretty tightly interwired with ORDER BY ... it needs work.

Yes, SELECT DISTINCT was my motivating example, though in my specific
application latency (i.e. not having to wait for the entire query
below the DISTINCT operation to finish) was also an important factor,
hence my thoughts on a zero startup cost hash aggregation and
wondering if it would really be any kind of win in the end.
 -Aaron

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Solving hash table overrun problems