Odd statistics behaviour in 7.2
От | Gordon A. Runkle |
---|---|
Тема | Odd statistics behaviour in 7.2 |
Дата | |
Msg-id | 1013587861.21160.60.camel@spiff.runkleinc.com обсуждение исходный текст |
Список | pgsql-hackers |
Hello, all, I'm having a strange problem with v7.2 relating to statistics collection and plan calculation. I'm not sure if this relates to the problems Marc was seeing, but here goes. I have a table with 1,066,673 rows. The column I'm interested in has this distribution of values: tdnr_ct | ct ---------+-------- 16 | 1 4 | 1 3 | 58 2 | 68904 1 | 928171 This means that 'ct' records have 'tdnr_ct' duplicate values. As you can see, the index I have on this column is highly selective, and should be used to look up records based on this column. In v7.1.3, it always does. Under v7.2, it only sometimes does. I've looked at the statistics, thanks to what I learned from Tom and Marc's discussion, and I see that sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column gets a value of '-1' (desireable), and other times a value such as 56596 or something. This is with the default setting for the statistics. Doing a 'SET STATISTICS 40' on the column got me to '-0.106047', which is better. But even so, the values do change somewhat over subsequent runs of VACUUM ANALYZE. And sometimes I get the coveted '-1'. The query I'm running is fairly complex. The difference between getting the index lookup versus the sequential scan causes an order of magnitude difference in run time. The query plans are below. Same query, no changes, just the difference in statistics. The desireable query plan: Unique (cost=176572.08..177673.89 rows=3673 width=176) -> Sort (cost=176572.08..176572.08 rows=36727 width=176) -> Merge Join (cost=172982.30..173787.35 rows=36727 width=176) -> Sort (cost=169436.41..169436.41 rows=27883width=142) -> Nested Loop (cost=0.00..167377.66 rows=27883 width=142) -> Seq Scan on pprv_ticket ptk (cost=0.00..3345.83 rows=27883 width=125) -> Index Scan using xie01_cat24 on cat24_ticket_doc_id c24 (cost=0.00..5.87 rows=1 width=17) -> Sort (cost=3545.89..3545.89 rows=37048 width=34) -> Seq Scan on pprv_violation pe (cost=0.00..734.48 rows=37048 width=34) SubPlan -> Aggregate (cost=5.87..5.87 rows=1 width=17) -> Index Scan using xie01_cat24 on cat24_ticket_doc_id (cost=0.00..5.87 rows=1 width=17) -> Aggregate (cost=5.88..5.88 rows=1 width=17) -> Index Scan using xie01_cat24 on cat24_ticket_doc_id (cost=0.00..5.88 rows=1 width=17) The undesireable query plan: Unique (cost=1129322.57..1187392.58 rows=193567 width=176) -> Sort (cost=1129322.57..1129322.57 rows=1935667 width=176) -> Merge Join (cost=204226.57..249046.32 rows=1935667 width=176) -> Merge Join (cost=200135.91..209436.90 rows=525268 width=142) -> Sort (cost=6435.89..6435.89 rows=27883 width=125) -> Seq Scan on pprv_ticket ptk (cost=0.00..3335.83 rows=27883 width=125) -> Sort (cost=193700.02..193700.02 rows=1066173 width=17) -> Seq Scan on cat24_ticket_doc_id c24 (cost=0.00..50164.73 rows=1066173 width=17) -> Sort (cost=4090.66..4090.66 rows=37048 width=34) -> Seq Scan on pprv_violation pv (cost=0.00..734.48 rows=37048 width=34) SubPlan -> Aggregate (cost=74.72..74.72 rows=1 width=17) -> Index Scan using xie01_cat24 on cat24_ticket_doc_id (cost=0.00..74.67 rows=19 width=17) -> Aggregate (cost=29.12..29.12 rows=1 width=17) -> Index Scan using xie07_cat24 on cat24_ticket_doc_id (cost=0.00..29.12 rows=1 width=17) I hope I've given enough information that it makes sense. If there's anything I can do my end to help figure this out, let me know. Thanks, Gordon. -- "Far and away the best prize that life has to offer is the chance to work hard at work worth doing." -- Theodore Roosevelt
В списке pgsql-hackers по дате отправления: