Re: Status of DISTINCT-by-hashing work

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Status of DISTINCT-by-hashing work
Дата
Msg-id 9926.1217953786@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Status of DISTINCT-by-hashing work  (Gregory Stark <stark@enterprisedb.com>)
Ответы Re: Status of DISTINCT-by-hashing work  ("Hitoshi Harada" <umi.tanuki@gmail.com>)
Re: Status of DISTINCT-by-hashing work  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> There are still two places in the system that hard-wire the use of
>> sorting for duplicate elimination:
>> 
>> * Set operations (UNION/INTERSECT/EXCEPT)

> Egads. Are you thinking to reimplement them more in line with the way other
> nodes work? Or just have them choose between hashing and sorting themselves?

Well, actually, after looking closer I'm realizing that it's harder than
I thought.  I had been thinking that we could just have the planner
choose whether to generate grouping instead of sorting nodes, but that
only works for plain UNION.  For INTERSECT/EXCEPT (with or without ALL),
you really need to maintain counters in each hashtable entry so you know
how many matching rows you got from each side of the set operation.
So it'd be necessary to either duplicate a large chunk of nodeAgg.c, or
make that code handle hashed INTERSECT/EXCEPT along with all its
existing duties.  Neither of which seems particularly appealing :-(.
I'm going to look at whether nodeAgg can be refactored to avoid this,
but I'm feeling a bit discouraged about it at the moment.

> I recall being quite mystified by how distinct aggregates work when the sort
> didn't appear anywhere in EXPLAIN output. If we could manage to expose that
> info in the plan somehow it would be a great improvement even if we didn't
> actually improve the plans available.

The problem is that each DISTINCT aggregate needs its own sort (or
hash), which doesn't seem to fit into our plan tree structure.

> Any idea what would the needed executor infrastructure look like? Would it
> have anything in common with the OLAP window functions infrastructure?

Possibly; I haven't paid much attention to the OLAP work yet.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Russell Smith
Дата:
Сообщение: Re: DROP DATABASE always seeing database in use
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: IN vs EXISTS equivalence