Re: multivariate statistics v14
От | Tomas Vondra |
---|---|
Тема | Re: multivariate statistics v14 |
Дата | |
Msg-id | eb8522d2-f1ee-7401-2118-e7d6b8990483@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: multivariate statistics v14 (Tatsuo Ishii <ishii@postgresql.org>) |
Ответы |
Re: multivariate statistics v14
|
Список | pgsql-hackers |
On 03/23/2016 06:20 AM, Tatsuo Ishii wrote: >>> I am now looking into the create statistics doc to see if the example >>> appearing in it is working. I will get back if I find any. > > I have the ref doc: CREATE STATISTICS > > There are nice examples how the multivariate statistics gives better > row number estimation. So I gave them a try. > > "Create table t1 with two functionally dependent columns, > i.e. knowledge of a value in the first column is sufficient for > determining the value in the other column" The example creates table > "t1", then populates it using generate_series. After CREATE > STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates > how result rows estimation is enhanced by using the multivariate > statistics. > > Here is the EXPLAIN output using the multivariate statistics: > > EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1); > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Seq Scan on t1 (cost=0.00..19425.00 rows=98 width=8) (actual time=76.876..76.876 rows=0 loops=1) > Filter: ((a = 1) AND (b = 1)) > Rows Removed by Filter: 1000000 > Planning time: 0.146 ms > Execution time: 76.896 ms > (5 rows) > > Here is the EXPLAIN output without the multivariate statistics: > > EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1); > QUERY PLAN > -------------------------------------------------------------------------------------------------- > Seq Scan on t1 (cost=0.00..19425.00 rows=1 width=8) (actual time=78.867..78.867 rows=0 loops=1) > Filter: ((a = 1) AND (b = 1)) > Rows Removed by Filter: 1000000 > Planning time: 0.102 ms > Execution time: 78.885 ms > (5 rows) > > It seems the row numbers estimation (98) using the multivariate > statistics is actually *worse* than the one (1) not using the > statistics because the actual row number is 0. Yes, there's a mistake in the first query, because the conditions actually are not compatible. I.e. (i/100)=1 and (i/500)=1 have no overlapping rows, clearly. It should be EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0); instead. Will fix. > > Next example (using table "t2") is much better than the case using t1. > > Here is the EXPLAIN output using the multivariate statistics: > > EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1); > QUERY PLAN > -------------------------------------------------------------------------------------------------------- > Seq Scan on t2 (cost=0.00..19425.00 rows=9633 width=8) (actual time=0.012..75.350 rows=10000 loops=1) > Filter: ((a = 1) AND (b = 1)) > Rows Removed by Filter: 990000 > Planning time: 0.107 ms > Execution time: 75.680 ms > (5 rows) > > Here is the EXPLAIN output without the multivariate statistics: > > EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1); > QUERY PLAN > ------------------------------------------------------------------------------------------------------ > Seq Scan on t2 (cost=0.00..19425.00 rows=91 width=8) (actual time=0.008..76.614 rows=10000 loops=1) > Filter: ((a = 1) AND (b = 1)) > Rows Removed by Filter: 990000 > Planning time: 0.067 ms > Execution time: 76.935 ms > (5 rows) > > This time it seems the row numbers estimation (9633) using the > multivariate statistics is much better than the one (91) not using the > statistics because the actual row number is 10000. > > The last example (using table "t3") seems no effect by multivariate statistics. Yes. There's a typo in the example - it analyzes the wrong table (t2 instead of t3). Once I fix that, the estimates are much better. > In summary, the only case which shows the effect of the multivariate > statistics is the "t2" case. So I don't see why other examples are > shown in the manual. Am I missing something? No, thanks for spotting those mistakes. I'll fix them and submit a new version of the patch - either later today or perhaps tomorrow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: