Re: Solving hash table overrun problems

Поиск
Список
Период
Сортировка
От Aaron Birkland
Тема Re: Solving hash table overrun problems
Дата
Msg-id 19ab0ccd05030616463919926a@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Solving hash table overrun problems  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> > This also brings up a line of thought I had a while ago on a related
> > topic.  Something like a "HashDistinct" might be useful, if it had no
> > startup cost.  

> We already have that: the planner will use a HashAgg node in this
> fashion in some contexts (I think just as one of the ways to do IN,
> at the moment).  

Hmm.. I see HashAggregate being used that way in the IN queries, but I
have not observed it used in a way that incurrs no startup cost.  It
looked to me that in doing hash aggregation in ExecAgg (nodeAgg.c),
agg_fill_hash_table() would have to be called, which iterate through
every output of the child plan building the hash table before it
returns, thus incurring at least the startup cost of executing the
entire subplan of the child node at the aggregation stage.  I'm not
too familiar with the code, so there is probably something I'm missing
somewhere :(

> It's not yet bright enough to consider doing it for
> SELECT DISTINCT.  The DISTINCT planning code is old and crufty and
> pretty tightly interwired with ORDER BY ... it needs work.

Yes, SELECT DISTINCT was my motivating example, though in my specific
application latency (i.e. not having to wait for the entire query
below the DISTINCT operation to finish) was also an important factor,
hence my thoughts on a zero startup cost hash aggregation and
wondering if it would really be any kind of win in the end.
 -Aaron


В списке pgsql-hackers по дате отправления:

Предыдущее
От: mchron@aueb.gr
Дата:
Сообщение: ERROR: unrecognized node type in PostgresMain( )
Следующее
От: Tom Lane
Дата:
Сообщение: Re: ERROR: unrecognized node type in PostgresMain( )