Re: Odd problem with performance in duplicate database

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Odd problem with performance in duplicate database
Дата
Msg-id 27573.1060648162@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Odd problem with performance in duplicate database  ("Peter Darley" <pdarley@kinesis-cem.com>)
Ответы Re: Odd problem with performance in duplicate database
Re: Odd problem with performance in duplicate database
Список pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> Tom,
>> Okay, here's our problem:
>>
>> live DB: tgroup_id  n_distinct = -1
>>
>> test DN: tgroup_id  n_distinct = 11
>>
>> The former estimate actually means that it thinks tgroup_id is a unique
>> column, whereas the latter says there are only 11 distinct values in the
>> column.  I assume the former is much nearer to the truth (how many rows
>> in cases, and how many distinct tgroup_id values)?

> The real case is that there are 113 distinct tgroup_ids, which cover
> about 10% of the population of cases.  The other 90% is NULL.  The
> average tgroup_id is shared between 4.7 cases.

> So this seems like sampling error.

Partly.  The numbers suggest that in ANALYZE's default sample of 3000
rows, it's only finding about a dozen non-null tgroup_ids (yielding the
0.996 null_frac value); and that in one case all dozen are different and
in the other case there are two duplicates.  It would help if you
boosted the stats target for this column by a factor of 10.  (You can
do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
show that you did so.)

But the other part of the problem is that in 7.2, the join selectivity
estimator is way off when you are joining a unique column (like the pkey
on the other side) to a column with a very large fraction of nulls.
We only discovered this recently; it's fixed as of 7.3.3:

2003-04-15 01:18  tgl

    * src/backend/utils/adt/selfuncs.c (REL7_3_STABLE): eqjoinsel's
    logic for case where MCV lists are not present should account for
    NULLs; in hindsight this is obvious since the code for the
    MCV-lists case would reduce to this when there are zero entries in
    both lists.  Per example from Alec Mitchell.

Possibly you could backpatch that into 7.2, although I'd think an update
to 7.3.4 would be a more profitable use of time.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Odd problem with performance in duplicate database
Следующее
От: "Christopher Kings-Lynne"
Дата:
Сообщение: Re: Perfomance Tuning