Re: O(N^2) when building multi-column MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: O(N^2) when building multi-column MCV lists
Дата
Msg-id 20190621102550.x6ohcp6hwg5efkc7@development
обсуждение исходный текст
Ответ на O(N^2) when building multi-column MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi,

Attached is a WIP/PoC fix addressing the O(N^2) behavior in ANALYZE with
high statistic target values. It needs more work, but it's good enough to
show some measurements.

For benchmark, I've created a simple 2-column table, with MCV list on
those two columns:

  CREATE TABLE t (a int, b int);
  CREATE STATISTICS s (mcv) ON a,b FROM t;

and then loaded data sets with different numbers of random combinations,
determined by number of values in each column. For example with 10 values
in a column, you get ~100 combinations.

  INSERT INTO t
  SELECT 10*random(), 10*random() FROM generate_series(1,3e6);

The 3M rows is picked because that's the sample size with target 10000.

The results with different statistic targets look like this:

1) master

    values        100       1000        5000       10000
    ====================================================
        10        103        586        2419        3041
       100        116        789        4778        8934
      1000        116        690        3162      499748

2) patched

    values        100       1000        5000       10000
    ====================================================
        10        113        606        2460        3716
       100        143        711        3371        5231
      1000        156        994        3836        6002

3) comparison (patched / master)

    values        100       1000        5000       10000
    ====================================================
        10       110%       103%        102%        122%
       100       123%        90%         71%         59%
      1000       134%       144%        121%          1%


So clearly, the issue for large statistic targets is resolved (duration
drops from 500s to just 6s), but there is measurable regression for the
other cases. That needs more investigation & fix before commit.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: allow_system_table_mods stuff
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: ldapbindpasswdfile