Re: O(N^2) when building multi-column MCV lists
От | Tomas Vondra |
---|---|
Тема | Re: O(N^2) when building multi-column MCV lists |
Дата | |
Msg-id | 20190621102550.x6ohcp6hwg5efkc7@development обсуждение исходный текст |
Ответ на | O(N^2) when building multi-column MCV lists (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi, Attached is a WIP/PoC fix addressing the O(N^2) behavior in ANALYZE with high statistic target values. It needs more work, but it's good enough to show some measurements. For benchmark, I've created a simple 2-column table, with MCV list on those two columns: CREATE TABLE t (a int, b int); CREATE STATISTICS s (mcv) ON a,b FROM t; and then loaded data sets with different numbers of random combinations, determined by number of values in each column. For example with 10 values in a column, you get ~100 combinations. INSERT INTO t SELECT 10*random(), 10*random() FROM generate_series(1,3e6); The 3M rows is picked because that's the sample size with target 10000. The results with different statistic targets look like this: 1) master values 100 1000 5000 10000 ==================================================== 10 103 586 2419 3041 100 116 789 4778 8934 1000 116 690 3162 499748 2) patched values 100 1000 5000 10000 ==================================================== 10 113 606 2460 3716 100 143 711 3371 5231 1000 156 994 3836 6002 3) comparison (patched / master) values 100 1000 5000 10000 ==================================================== 10 110% 103% 102% 122% 100 123% 90% 71% 59% 1000 134% 144% 121% 1% So clearly, the issue for large statistic targets is resolved (duration drops from 500s to just 6s), but there is measurable regression for the other cases. That needs more investigation & fix before commit. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: