no MCV list of tiny table with unique columns

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема no MCV list of tiny table with unique columns
Дата
Msg-id 20161102185318.GA3987@telsasoft.com
обсуждение исходный текст
Ответы Re: no MCV list of tiny table with unique columns  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
We have a report query which joins (multiple times, actually) against this
trivial, tiny table:

ts=# \d bsm_to_switch
Table "public.bsm_to_switch"
 Column | Type | Modifiers
--------+------+-----------
 bsm    | text | not null
 switch | text | not null

ts=# SELECT length(bsm), length(switch) FROM bsm_to_switch;
 length | length
--------+--------
     10 |      6
     10 |      6
(2 rows)

The column values are distinct.

I believe the join is being (badly) underestimated, leading to a crappy plan
involving multiple nested loop joins, which takes 2.5 hours instead of a
handful of seconds; I believe that might be resolved by populating its MCV
list..

..however, on reading commands/analyze.c, the issue is these columns have no
duplicates, and also postgres decides that "since the number of distinct rows
is greater than 10% of the total number of rows", that ndistinct should be -1
(meaning it scales with the table size).  That's fine, except that it then
effectively precludes populating the MCV list.

|    if (nmultiple == 0)
|    {
|        /*
|         * If we found no repeated non-null values, assume it's a unique
|         * column; but be sure to discount for any nulls we found.
|         */
|        stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
|    }
|    else if (track_cnt < track_max && toowide_cnt == 0 &&
|             nmultiple == track_cnt)
|    {
|        /*
|         * Our track list includes every value in the sample, and every
|         * value appeared more than once.  Assume the column has just
|         * these values.  (This case is meant to address columns with
|         * small, fixed sets of possible values, such as boolean or enum
|         * columns.  If there are any values that appear just once in the
|         * sample, including too-wide values, we should assume that that's
|         * not what we're dealing with.)
|         */
|        stats->stadistinct = track_cnt;
|    }

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE
tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |         -1 |
 switch  | f         |         0 |         7 |         -1 |
(2 rows)


Any ideas?  I tried setting n_distinct=2, but that seems to not have any effect
within ANALYZE itself.

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE
tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |          2 |
 switch  | f         |         0 |         7 |          2 |
(2 rows)

Thanks in advance.

Justin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Rick Otten
Дата:
Сообщение: Re: Perf decreased although server is better
Следующее
От: Joao Junior
Дата:
Сообщение: archive_command too slow.