>>> On Wed, Feb 1, 2006 at 2:36 pm, in message
<4359.1138826175@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm interested to poke at this ... are you in a position to provide
a
>>> test case?
>
>> I can't supply the original data, since many of the tables have
>> millions of rows, with some of the data (related to juvenile,
paternity,
>> sealed, and expunged cases) protected by law. I could try to put
>> together a self- contained example, but I'm not sure the best way to
do
>> that, since the table sizes and value distributions may be
significant
>> here. Any thoughts on that?
>
> I think that the only aspect of the data that really matters here is
the
> number of distinct values, which would affect decisions about
whether
> HashAggregate is appropriate or not. And you could probably get the
> same thing to happen with at most a few tens of thousands of rows.
>
> Also, all we need to worry about is the columns used in the
WHERE/JOIN
> conditions, which looks to be mostly case numbers, dates, and county
> identification ... how much confidential info is there in that? At
> worst you could translate the case numbers to some randomly
generated
> identifiers.
OK, I could probably obliterate name, addresses, etc. in a copy of the
data (those aren't significant to the query anyway) and provide a test
case. However, I just found another clue.
Since you were so confident it couldn't be the outer join, I went
looking for what else I changed at the same time. I eliminated the code
referencing that table, which contained an OR. I've seen ORs cause
nasty problems with optimizers in the past. I took out the OR in the
where clause, without eliminating that last outer join, and it optimized
fine.
I'll hold off a bit to see if you still need the test case. ;-)
-Kevin