Обсуждение: Outer joins and equivalence
I have a complex query where making a small change to the SQL increases run-time by > 1000 times. The first SQL statement is of the form A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) and the second is like this A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) the only difference is the substitution of a -> b This has been verified by examining EXPLAIN of SQL1, SQL2 and SQL1. The first and third EXPLAINs are equivalent. All ANALYZE etc has been run. All relevant join columns are INTEGERs. So we have a repeatable difference in plans attributable to a single change. The difference in run time occurs because the second form of the query uses a SeqScan of a large table, whereas the first form is able to use a nested loops join to access the large table, which then allows it to access just 3 rows rather than 85 million rows. There is a clear equivalence between the two forms of SQL, since the equivalence a = b is derived from a natural rather than an outer join. This can be applied from the left side to the right side of the join. So this looks to me like either a bug or just an un-implemented optimizer feature. The code I've just looked at for equivalent class relationships appears to refer to using this to propagate constant info only, so I'm thinking it is not a bug. and hence why it is reported here and not to pgsql-bugs. I do recognise that we would *not* be able to deduce this form of SQL A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id) though that restriction on outer join equivalence is not relevant here. (SQL, EXPLAINs etc available off-list only, by request). I'm looking into this more now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: > I have a complex query where making a small change to the SQL increases > run-time by > 1000 times. > The first SQL statement is of the form > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) > and the second is like this > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) > the only difference is the substitution of a -> b Please provide an actual test case. regards, tom lane
On Tue, 27 May 2008, Simon Riggs wrote: > I do recognise that we would *not* be able to deduce this form of SQL > > A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id) Surely that would not be valid SQL? Matthew -- Psychotics are consistently inconsistent. The essence of sanity is to be inconsistently inconsistent.
On Wed, 2008-05-28 at 11:45 +0100, Matthew Wakeling wrote: > On Tue, 27 May 2008, Simon Riggs wrote: > > I do recognise that we would *not* be able to deduce this form of SQL > > > > A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id) > > Surely that would not be valid SQL? You are right, but my point was about inferences during SQL planning, not about initial analysis of the statement. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I have a complex query where making a small change to the SQL increases > > run-time by > 1000 times. > > > The first SQL statement is of the form > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) > > > and the second is like this > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) > > > the only difference is the substitution of a -> b > > Please provide an actual test case. Getting closer, but still not able to produce a moveable test case. Symptoms are * using partitioning * when none of the partitions are excluded * when equivalence classes ought to be able to reconcile join Still working on it -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Mon, 2008-06-02 at 18:10 +0100, Simon Riggs wrote: > On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > I have a complex query where making a small change to the SQL increases > > > run-time by > 1000 times. > > > > > The first SQL statement is of the form > > > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) > > > > > and the second is like this > > > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) > > > > > the only difference is the substitution of a -> b > > > > Please provide an actual test case. > > Getting closer, but still not able to produce a moveable test case. I've got a test case which shows something related and weird, though not the exact case. The queries shown here have significantly different costs, depending upon whether we use tables a or b in the query. Since a and b are equivalent this result isn't expected at all. I suspect the plan variation in the original post is somehow cost related and we are unlikely to discover the exact plan. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support