> The problem here is that the planner is being way too optimistic about
> the selectivity of LIKE '%DAVID%' --- notice the estimate that only
> one matching row will be found in cliente, rather than 54 as with '%DA%'.
> So it chooses a plan that avoids the sort overhead needed for an
> efficient merge join with the other tables. That would be a win if
> there were only one matching row, but as soon as there are lots, it's
> a big loss, because the subquery to join the other tables gets redone
> for every matching row :-(
>
> >> Also, how many rows are there really that match '%DA%' and '%DAVID%'?
>
> > 1) 2672 rows -> 3.59 sec.
> > 2) 257 rows -> 364.69 sec.
>
> I am thinking that the rules for selectivity of LIKE patterns probably
> need to be modified. Presently the code assumes that a long constant
> string has probability of occurrence proportional to the product of the
> probabilities of the individual letters. That might be true in a random
> world, but people don't search for random strings. I think we need to
> back off the selectivity estimate by some large factor to account for
> the fact that the pattern being searched for is probably not random.
> Anyone have ideas how to do that?
Let's use the above example with the new FIXED_CHAR_SEL values:
With the new 0.20 value for FIXED_CHAR_SEL, we see for DA and DAVID
above:
DA 1) 0.20 ^ 2
.04
DAVID 2) 0.20 ^ 5
.00032
If we divide these two, we get:
> 0.04 / 0.00032
125
while looking at the total counts reported above, we get:
> 2672 / 257
~10.39688715953307392996
The 0.04 value gives a value of:
> 0.04 ^ 2
.0016
> 0.04 ^ 5
.0000001024
> .0016 / .0000001024
15625
Clearly the 0.20 value is 10x too large, while the 0.04 value is 1000x
too large. Because this was a contrived example, and because some have
more random text than DAVID in their field, I think 0.20 is the proper
value.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026