Re: Allowing NOT IN to use ANTI joins

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Allowing NOT IN to use ANTI joins
Дата
Msg-id CAApHDvrJrz-0xinyiqTiWs0mFX17GWD2Y8VZ+i92nuZsha8ocw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Jul 16, 2014 at 9:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 15 July 2014 12:58, David Rowley <dgrowleyml@gmail.com> wrote:
>> I found that the call to is_NOTANY_compatible_with_antijoin adds about 0.2%
>> and 2.3% to total planning time. Though the 2.3% was quite an extreme case,
>> and the 0.2% was the most simple case I could think of.

> Is there a way we can only run this extra test when we have reasonable
> idea that there is potential to save significant costs?

Well, all of this cost occurs only when we find a NOT IN clause in a place
where we could conceivably turn it into an antijoin.  I think it's
unquestionably a win to make that transformation if possible.  My concern
about it is mainly that the whole thing is a band-aid for naively written
queries, and it seems likely to me that naively written queries aren't
going to be amenable to making the proof we need.  But we can't tell that
for sure without doing exactly the work the patch does.


I do think Simon has a good point, maybe it's not something for this patch, but perhaps other planner patches that potentially optimise queries that could have been executed more efficiently if they had been written in another way. 

Since the planning time has been added to EXPLAIN ANALYZE I have noticed that in many very simple queries that quite often planning time is longer than execution time, so I really do understand the concern that we don't want to slow the planner down for these super fast queries. But on the other hand, if this extra 1-2 microseconds that the NOT IN optimisation was being frowned upon and the patch had been rejected based on that, at the other end of the scale, I wouldn't think it was too impossible for spending that extra 2 microseconds to translate into shaving a few hours off of the execution time of a query being run in an OLAP type workload. If that was the case then having not spent the 2 extra microseconds seems quite funny, but there's no real way to tell I guess unless we invented something to skip the known costly optimisations on first pass then re-plan the whole query with the planning strength knob turned to the maximum if the final query cost more than N.

Off course such a idea would make bad plans even harder to debug, so it's far from perfect, but I'm not seeing other options for the best of both worlds.

Regards

David Rowley

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Viswanatham kirankumar
Дата:
Сообщение: [TODO] Process pg_hba.conf keywords as case-insensitive
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: better atomics - v0.5