Re: Patch to support SEMI and ANTI join removal

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Patch to support SEMI and ANTI join removal
Дата
Msg-id CAApHDvpDXXvKE+=ug1kA--nKfa=bjrjvK8Gp9G8UYwv6nHckVg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Patch to support SEMI and ANTI join removal  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Patch to support SEMI and ANTI join removal  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Aug 28, 2014 at 6:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If the majority of the added code is code that will be needed for
less-bogus optimizations, it might be all right; but I'd kind of want to
see the less-bogus optimizations working first.


That seems fair. Likely there'd be not a great deal of value to semi and anti joins removal alone. I was more just trying to spread weight of an inner join removal patch...

So In response to this, I've gone off and written an inner join removal support patch (attached), which turned out to be a bit less complex than I thought.


Here's a quick demo, of the patch at work:

test=# create table c (id int primary key);
CREATE TABLE
test=# create table b (id int primary key, c_id int not null references c(id));
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references b(id));
CREATE TABLE
test=#
test=# explain select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on a  (cost=0.00..31.40 rows=2140 width=8)
 Planning time: 1.061 ms
(2 rows)

Perhaps not a greatly useful example, but if you can imagine the joins are hidden in a view and the user is just requesting a small subset of columns, then it does seem quite powerful.

There's currently a few things with the patch that I'll list below, which may raise a few questions:

1. I don't think that I'm currently handling removing eclass members properly. So far the code just removes the Vars that belong to the relation being removed. I likely should also be doing bms_del_member(ec->ec_relids, relid); on the eclass, but perhaps I should just be marking the whole class as "ec_broken = true" and adding another eclass all everything that the broken one has minus the parts from the removed relation?

2. Currently the inner join removal is dis-allowed if the (would be) removal relation has *any* baserestrictinfo items. The reason for this is that we must ensure that the inner join gives us exactly 1 row match on the join condition, but a baserestrictinfo can void the proof that a foreign key would give us that a matching row does exist. However there is an exception to this that could allow that restriction to be relaxed. That is if the qual in baserestrictinfo use vars that are in an eclass, where the same eclass also has ec members vars that belong to the rel that we're using the foreign key for to prove the relation not needed.... umm.. that's probably better described by example:

Assume there's a foreign key a (x) reference b(x)

SELECT a.* FROM a INNER JOIN b ON a.x = b.x WHERE b.x = 1

relation b should be removable because an eclass will contain {a.x, b.x} and therefore s baserestrictinfo for a.x = 1 should also exist on relation a. Therefore removing relation b should produce equivalent results, i.e everything that gets filtered out on relation b will also be filtered out on relation a anyway.

I think the patch without this is still worth it, but if someone feels strongly about it I'll take a bash at supporting it.

3. Currently the inner join support does not allow removals using foreign keys which contain duplicate columns on the referencing side. e.g (a,a) references (x,y), this is basically because of the point I made in item 2. In this case a baserestrictinfo would exist on the referenced relation to say WHERE x = y. I'd have to remove the restriction described in item 2 and do a small change to the code that extracts the join condition from the eclass for this to work. But it's likely a corner case that's not worth too much trouble to support. I think probably if I saw an FK like that in the field, I'd probably scratch my head for a while, while trying to understanding why they bothered.

4. The patch currently only allows removals for eclass join types. If the rel has any joininfo items, then the join removal is disallowed. From what I can see equality type inner join conditions get described in eclasses, and only non-equality join conditions make it into the joininfo list, and since foreign keys only support equality operators, then I thought this was a valid restriction, however, if someone can show me a flaw in my assumption then I may need to improve this.

5. I've added a flag to pg_class called relhasfkey. Currently this gets set to true when a foreign key is added, though I've added nothing to set it back to false again. I notice that relhasindex gets set back to false during vacuum, if vacuum happens to find there to not be any indexes on the rel. I didn't put my logic here as I wasn't too sure if scanning pg_constraint during a vacuum seemed very correct, so I just left out the "setting it to false" logic based on the the fact that I noticed that relhaspkey gets away with quite lazy setting back to false logic (only when there's no indexes of any kind left on the relation at all).

The only think else I can think of is perhaps optimising a little. I was thinking likely most queries wont benefit from this too much, so I was thinking of adding some logic to skip all join removals by pulling out the varnos from the target list entries and skipping even attempting to perform a join removal for a relation that has its varno in the targetlist of the query. Though perhaps a few benchmarks will determine if this is worth it or not.

Comments are welcome. -- I'm really hoping this patch generates a bit more interest than the SEMI/ANTI join removal one!

Regards

David Rowley




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Scaling shared buffer eviction
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Support for N synchronous standby servers