Re: Execute query with EXCEPT, INTERSECT as anti-join, join?

Поиск
Список
Период
Сортировка
От Colin 't Hart
Тема Re: Execute query with EXCEPT, INTERSECT as anti-join, join?
Дата
Msg-id CAMon-aRv2URrkzdmuUEsRfjW4KNwf7f5DOBrfH14ioXtruGw8g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Execute query with EXCEPT, INTERSECT as anti-join, join?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Execute query with EXCEPT, INTERSECT as anti-join, join?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 11 November 2013 15:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Colin 't Hart" <colin@sharpheart.org> writes:
>> On 11 November 2013 14:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> No, and it probably won't ever be, since the semantics aren't the same.
>>> EXCEPT/INTERSECT imply duplicate elimination.
>
>> Can't we just use DISTINCT for that?
>
> If you have to do a DISTINCT it's not clear to me that you're going to get
> much win.
>
> (The bigger picture here is that pretty much zero optimization effort has
> been spent on EXCEPT/INTERSECT, because they're just not used that much
> compared to other places where we could put that effort.)

I'm asking because I just encountered several cases where the
anti-join was *much* faster. In each case <query_1>'s result was
relatively small compared to <query_2>'s result or the "related rows"
from <query_2> were a much smaller set than the whole result of
<query_2>. In these cases, when the executor new how the two halves of
the query were related -- and that's the crux here: by writing EXCEPT
the executor couldn't determine how the two halves of the query were
related -- the anti-join was about 1000 times faster.

I think it's similar to the NOT IN which most DBMSes solved about 20
years ago but before that everyone used to rewrite by hand as NOT
EXISTS: sometimes we want to write query as EXCEPT because it's
clearer but execute it as an anti-join with DISTINCT.

Would these be difficult to build in? While I know a lot about how
DBMS engines work I've not hacked at PG internals. I'd be more than
willing to look at it, but could use some pointers as to where to
start. In particular, does PG rewrite queries in any way? Is it
possible to simply "rewrite the query" and then pass to the optimizer
to see if it would result in a better plan?

if I can improve EXCEPT, the same could also be applied to INTERSECT
as a "DISTINCT join".

Thanks & regards,

Colin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: pg_dump and pg_dumpall in real life
Следующее
От: "Colin 't Hart"
Дата:
Сообщение: Re: TABLE not synonymous with SELECT * FROM?