Re: Allowing NOT IN to use ANTI joins

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Allowing NOT IN to use ANTI joins
Дата
Msg-id CAApHDvqRB-iFBy68=dCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Allowing NOT IN to use ANTI joins  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Allowing NOT IN to use ANTI joins  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Mon, Jul 14, 2014 at 8:55 PM, David Rowley <dgrowleyml@gmail.com> wrote:

Since the patch has not been marked as rejected I was thinking that I'd take a bash at fixing it up, but if you think this is a waste of time, please let me know.
 


I've made some changes to the patch so that it only allows the conversion to ANTI JOIN to take place if both the outer query's expressions AND the subquery's target list can be proved not to have NULLs.

I've attached a delta, which is the changes I've made on top of Tom's cleaned up version of my patch, and also a full patch.

I've also performed some benchmarks to try to determine how much time it takes to execute this null checking code. I ended up hacking the code a little for the benchmarks and just put the null checking function in a tight loop that performed 100000 iterations. 

Like:
if (under_not)
{
int x;
bool result;
for (x = 0; x < 100000; x++)
{
result = is_NOTANY_compatible_with_antijoin(parse, sublink);
}
if (!result)
return NULL;
}

I then ran 6 queries, 3 times each through the planner and grabbed the "Planning Time" from the explain analyze result.
I then removed the extra looping code (seen above) and compiled the code as it is with the attached patch.
I then ran each of the 6 queries again 3 times each and noted down the "Planning Time from the explain analyze result.

In my results I assumed that the  first set of times divided by 100000 would be the time taken to perform the NULL checks... This is not quite accurate, but all the other planning work was quite well drowned out by the 100k loop.

I found that the call to is_NOTANY_compatible_with_antijoin adds about 0.2% and 2.3% to total planning time. Though the 2.3% was quite an extreme case, and the 0.2% was the most simple case I could think of.

I've attached the complete results in html format. I've also attached the schema that I used and all 6 queries tested.

Here's 2 points which I think are important to note about the planning time overhead of this patch:
1. There is no additional overhead if the query has no NOT IN clause.
2. The test queries 3 and 6 were to benchmark the overhead of when the NOT NULL test fails. The slowest of these was test 3 which added just under 0.5% to the planning time. The query that added a 2.3% overhead performed an ANTI JOIN, so likely the reduction in execution time more than made up for the extra planning time.

Regards

David Rowley
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Removing dependency to wsock32.lib when compiling code on WIndows
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Removing dependency to wsock32.lib when compiling code on WIndows