Обсуждение: Planner question
Why does the planner consider both input variations of each symmetric merge join? The README says "there is not a lot ofdifference" between the two options. When are there any differences? -Tom Raney
On Fri, 2008-09-05 at 11:21 -0700, Tom Raney wrote: > Why does the planner consider both input variations of each symmetric merge join? The README says "there is not a lotof difference" between the two options. When are there any differences? > > -Tom Raney > http://archives.postgresql.org/pgsql-general/2008-08/msg00967.php My understanding from that thread is that if one table has high ndistinct and the other has low ndistinct, one plan may require more re-scanning than the other. Regards,Jeff Davis
Tom Raney <raneyt@cecs.pdx.edu> writes:
> Why does the planner consider both input variations of each symmetric merge join? The README says "there is not a
lotof difference" between the two options. When are there any differences?
The righthand side needs to support mark/restore, the left doesn't;
so depending on plan types one way might need a helper Materialize
node that the other way doesn't. Also, duplicated values are a bit
cheaper to process on the left than the right.
regards, tom lane
Tom Lane wrote:
> Tom Raney <raneyt@cecs.pdx.edu> writes:
>> Why does the planner consider both input variations of each symmetric merge join? The README says "there is not a
lotof difference" between the two options. When are there any differences?
>
> The righthand side needs to support mark/restore, the left doesn't;
> so depending on plan types one way might need a helper Materialize
> node that the other way doesn't. Also, duplicated values are a bit
> cheaper to process on the left than the right.
>
> regards, tom lane
>
Thank you for the explanation.
On a somewhat related issue, I am a bit stumped on the way path keys
function.
In the following query and debug data, why would an index scan on a
single relation contain a path key from a different relation?
optimizer/README says, "The PathKeys data structure represents what is
known about the sort order of the tuples generated by a particular
Path. A path's pathkeys field is a list of PathKey nodes, where the
n'th item represents the n'th sort key of the result."
Why does the index scan for tenk1 include a path key from
onek.unique2? Is it implying an equivalence there?
-Tom Raney
bench=# explain select * from tenk1 JOIN onek ON
tenk1.unique2=onek.unique2;
RELOPTINFO (tenk1): rows=10000 width=244 path list: SeqScan(tenk1) rows=10000 cost=0.00..434.00
IdxScan(tenk1)rows=10000 cost=0.00..583.25 pathkeys: ((tenk1.unique2, onek.unique2)) <---
cheapest startup path: SeqScan(tenk1) rows=10000 cost=0.00..434.00
cheapest total path: SeqScan(tenk1) rows=10000 cost=0.00..434.00
RELOPTINFO (onek): rows=1000 width=244 path list: SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(onek)rows=1000 cost=0.00..72.25 pathkeys: ((tenk1.unique2, onek.unique2))
cheapest startup path: SeqScan(onek) rows=1000 cost=0.00..44.00
cheapest total path: SeqScan(onek) rows=1000 cost=0.00..44.00
RELOPTINFO (tenk1/onek): rows=1000 width=488 path list: MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24
clauses: tenk1.unique2 = onek.unique2 IdxScan(tenk1) rows=10000 cost=0.00..583.25
pathkeys: ((tenk1.unique2, onek.unique2)) IdxScan(onek) rows=1000 cost=0.00..72.25
pathkeys:((tenk1.unique2, onek.unique2)) NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96 clauses:
tenk1.unique2= onek.unique2 SeqScan(onek) rows=1000 cost=0.00..44.00 IdxScan(tenk1)
rows=10000cost=0.00..1.70
cheapest startup path: NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96 clauses: tenk1.unique2
=onek.unique2 SeqScan(onek) rows=1000 cost=0.00..44.00 IdxScan(tenk1) rows=10000
cost=0.00..1.70
cheapest total path: MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24 clauses: tenk1.unique2 =
onek.unique2 IdxScan(tenk1) rows=10000 cost=0.00..583.25 pathkeys: ((tenk1.unique2,
onek.unique2)) IdxScan(onek) rows=1000 cost=0.00..72.25 pathkeys: ((tenk1.unique2,
onek.unique2))
QUERY PLAN
----------------------------------------------------------------------------------------- Merge Join
(cost=0.52..144.24rows=1000 width=488) Merge Cond: (tenk1.unique2 = onek.unique2) -> Index Scan using
tenk1_unique2on tenk1 (cost=0.00..583.25
rows=10000 width=244) -> Index Scan using onek_unique2 on onek (cost=0.00..72.25 rows=1000
width=244)
(4 rows)
bench=#
Tom Raney <raneyt@cecs.pdx.edu> writes:
> Why does the index scan for tenk1 include a path key from
> onek.unique2? Is it implying an equivalence there?
> bench=# explain select * from tenk1 JOIN onek ON
> tenk1.unique2=onek.unique2;
Yes, for an example like that the planner knows that tenk1.unique2 and
onek.unique2 will have equal values in any valid join row, so it's okay
to suppose that a sort by one is the same as a sort by the other. So
the pathkey items actually reference sets of variables{tenk1.unique2, onek.unique2}
not just individual variables.
> RELOPTINFO (tenk1): rows=10000 width=244
> path list:
> SeqScan(tenk1) rows=10000 cost=0.00..434.00
> IdxScan(tenk1) rows=10000 cost=0.00..583.25
> pathkeys: ((tenk1.unique2, onek.unique2)) <---
> cheapest startup path:
> SeqScan(tenk1) rows=10000 cost=0.00..434.00
> cheapest total path:
> SeqScan(tenk1) rows=10000 cost=0.00..434.00
Hm, I don't recognize this output format, is it coming from some custom
code?
regards, tom lane
Tom Lane wrote:
> Tom Raney <raneyt@cecs.pdx.edu> writes:
>> Why does the index scan for tenk1 include a path key from
>> onek.unique2? Is it implying an equivalence there?
>
>> bench=# explain select * from tenk1 JOIN onek ON
>> tenk1.unique2=onek.unique2;
>
> Yes, for an example like that the planner knows that tenk1.unique2 and
> onek.unique2 will have equal values in any valid join row, so it's okay
> to suppose that a sort by one is the same as a sort by the other. So
> the pathkey items actually reference sets of variables
> {tenk1.unique2, onek.unique2}
> not just individual variables.
Thanks.
>
>> RELOPTINFO (tenk1): rows=10000 width=244
>> path list:
>> SeqScan(tenk1) rows=10000 cost=0.00..434.00
>> IdxScan(tenk1) rows=10000 cost=0.00..583.25
>> pathkeys: ((tenk1.unique2, onek.unique2)) <---
>
>> cheapest startup path:
>> SeqScan(tenk1) rows=10000 cost=0.00..434.00
>
>> cheapest total path:
>> SeqScan(tenk1) rows=10000 cost=0.00..434.00
>
> Hm, I don't recognize this output format, is it coming from some custom
> code?
Yes, it is. I thought it was easier to read the OPTIMIZER_DEBUG
output with the relation names instead of the relation indexes. I
will post a patch against CVS HEAD if you think it will help others.
-Tom
Tom Raney wrote: > >> RELOPTINFO (tenk1): rows=10000 width=244 > >> path list: > >> SeqScan(tenk1) rows=10000 cost=0.00..434.00 > >> IdxScan(tenk1) rows=10000 cost=0.00..583.25 > >> pathkeys: ((tenk1.unique2, onek.unique2)) <--- > > > >> cheapest startup path: > >> SeqScan(tenk1) rows=10000 cost=0.00..434.00 > > > >> cheapest total path: > >> SeqScan(tenk1) rows=10000 cost=0.00..434.00 > > > > Hm, I don't recognize this output format, is it coming from some custom > > code? > > Yes, it is. I thought it was easier to read the OPTIMIZER_DEBUG > output with the relation names instead of the relation indexes. I > will post a patch against CVS HEAD if you think it will help others. Personally I would like to see optimizer debug become a configuration parameter rather than a compile-time parameter. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +