Re: Allowing NOT IN to use ANTI joins

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Allowing NOT IN to use ANTI joins
Дата
Msg-id CAApHDvoLkDtDS+AD28UzXDpYu3+_QRdcKyMj1mtjNRWk6BaLrg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Allowing NOT IN to use ANTI joins  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Ответы Re: Allowing NOT IN to use ANTI joins  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Jul 2, 2014 at 9:25 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Sun, Jun 29, 2014 at 4:18 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I think I'm finally ready for a review again, so I'll update the commitfest app.


I have reviewed this on code level.

1. Patch gets applied cleanly.
2. make/make install/make check all are fine

No issues found till now.

However some cosmetic points:

1.
 * The API of this function is identical to convert_ANY_sublink_to_join's,
 * except that we also support the case where the caller has found NOT EXISTS,
 * so we need an additional input parameter "under_not".

Since now, we do support NOT IN handling in convert_ANY_sublink_to_join() we
do have "under_not" parameter there too. So above comments near
convert_EXISTS_sublink_to_join() function is NO more valid.


Nice catch. I've removed the last 2 lines from that comment now. 
 
2. Following is the unnecessary change. Can be removed:

@@ -506,6 +560,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
                    return NULL;
                }
            }
+
        }
        /* Else return it unmodified */
        return node;


Removed
 
 
3. Typos:

a.
+ * queryTargetListListCanHaveNulls
...
+queryTargetListCanHaveNulls(Query *query)

Function name is queryTargetListCanHaveNulls() not
queryTargetListListCanHaveNulls()


Fixed.
 
b.
     *    For example the presense of; col IS NOT NULL, or col = 42 would allow

presense => presence



Fixed. 
 
4. get_attnotnull() throws an error for invalid relid/attnum.
But I see other get_att* functions which do not throw an error rather
returning some invalid value, eg. NULL in case of attname, InvalidOid in
case of atttype and InvalidAttrNumber in case of attnum. I have observed
that we cannot return such invalid value for type boolean and I guess that's
the reason you are throwing an error. But somehow looks unusual. You had put
a note there which is good. But yes, caller of this function should be
careful enough and should not assume its working like other get_att*()
functions.
However for this patch, I would simply accept this solution. But I wonder if
someone thinks other way round.


hmmm, I remember thinking about that at the time. It was a choice between returning false or throwing an error. I decided that it probably should be an error, as that's what get_relid_attribute_name() was doing. Just search lsyscache.c for "cache lookup failed for attribute %d of relation %u".
 

Testing more on SQL level.


Thank you for reviewing this. I think in the attached I've managed to nail down the logic in find_inner_rels(). It was actually more simple than I thought. On working on this today I also noticed that RIGHT joins can still exist at this stage of planning and also that full joins are not transformed. e.g: a FULL JOIN b ON a.id=b.id WHERE a.is IS NOT NULL would later become a LEFT JOIN, but at this stage in planning, it's still a FULL JOIN. I've added some new regression tests which test some of these join types.

With further testing I noticed that the patch was not allowing ANTI joins in cases like this:

explain select * from a where id not in(select x from b natural join c);

even if b.x and b.c were NOT NULL columns. This is because the TargetEntry for x has a varno which belongs to neither b or c, it's actually the varno for the join itself. I've added some code to detect this in find_inner_rels(), but I'm not quite convinced yet that it's in the correct place... I'm wondering if a future use for find_inner_rels() would need to only see relations rather than join varnos. The code I added does get the above query using ANTI joins, but it does have a bit of a weird quirk, in that for it to perform an ANTI JOIN, b.x must be NOT NULL. If c.x is NOT NULL and b.x is nullable, then there will be no ANTI JOIN. There must be some code somewhere that chooses which of the 2 vars that should go into the target list in for naturals joins.

The above got me thinking that the join conditions can also be used to prove not nullness too. Take the query above as an example, x can never actually be NULL, the condition b.x = c.x ensures that. I've only gone as far as adding some comments which explain that this is a possible future optimisation. I've not had time to look at this yet and I'd rather see the patch go in without it than risk delaying this until the next commitfest... Unless of course someone thinks it's just too weird a quirk to have in the code.

Attached is the updated version of the patch. 

Regards

David Rowley


However, assigning it to author to think on above cosmetic issues.

Thanks

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: postgresql.auto.conf and reload
Следующее
От: "MauMau"
Дата:
Сообщение: Re: Proposing pg_hibernate