Re: avoiding seq scans when two columns are very correlated
| От | Ruslan Zakirov | 
|---|---|
| Тема | Re: avoiding seq scans when two columns are very correlated | 
| Дата | |
| Msg-id | CAMOxC8vgTEUa3YbJ4e===QL1uvz67nf5qUkQH0YAtvrTKG56gw@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: avoiding seq scans when two columns are very correlated (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Список | pgsql-performance | 
On Fri, Nov 11, 2011 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ruslan Zakirov <ruz@bestpractical.com> writes:
>> A table has two columns id and EffectiveId. First is primary key.
>> EffectiveId is almost always equal to id (95%) unless records are
>> merged. Many queries have id = EffectiveId condition. Both columns are
>> very distinct and Pg reasonably decides that condition has very low
>> selectivity and picks sequence scan.
>
> I think the only way is to rethink your data representation.  PG doesn't
> have cross-column statistics at all, and even if it did, you'd be asking
> for an estimate of conditions in the "long tail" of the distribution.
> That's unlikely to be very accurate.
Rethinking schema is an option that requires more considerations as we
do it this way for years and run product on mysql, Pg and Oracle.
Issue affects Oracle, but it can be worked around by dropping indexes
or may be by building correlation statistics in 11g (didn't try it
yet).
Wonder if "CROSS COLUMN STATISTICS" patch that floats around would
help with such case?
> Consider adding a "merged" boolean, or defining effectiveid differently.
> For instance you could set it to null in unmerged records; then you
> could get the equivalent of the current meaning with
> COALESCE(effectiveid, id).  In either case, PG would then have
> statistics that bear directly on the question of how many merged vs
> unmerged records there are.
NULL in EffectiveId is the way to go, however when we actually need
those records (not so often situation) query becomes frightening:
SELECT main.* FROM Tickets main
    JOIN Tickets te
        ON te.EffectiveId = main.id
        OR (te.id = main.id AND te.EffectiveId IS NULL)
    JOIN OtherTable ot
        ON ot.Ticket = te.id
Past experience reminds that joins with ORs poorly handled by many optimizers.
In the current situation join condition is very straightforward and effective.
>                        regards, tom lane
--
Best regards, Ruslan.
		
	В списке pgsql-performance по дате отправления: