Re: Improving N-Distinct estimation by ANALYZE
От | Simon Riggs |
---|---|
Тема | Re: Improving N-Distinct estimation by ANALYZE |
Дата | |
Msg-id | 1136802512.21025.386.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | Re: Improving N-Distinct estimation by ANALYZE (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: Improving N-Distinct estimation by ANALYZE
Re: Improving N-Distinct estimation by ANALYZE Re: Improving N-Distinct estimation by ANALYZE |
Список | pgsql-hackers |
On Fri, 2006-01-06 at 16:13 -0500, Greg Stark wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > > Before we start debating merits of proposals based on random reads, can > > someone confirm that the sampling code actually does read randomly? I > > looked at it yesterday; there is a comment that states that blocks to be > > scanned are passed to the analyze function in physical order, and AFAICT > > the function that chooses blocks does so based strictly on applying a > > probability function to block numbers as it increments a counter. It > > seems that any reading is actually sequential and not random, which > > makes all the random_page_cost hand-waving null and void. > > Hm. I'm curious just how much that behaves like a sequential scan actually. I > think I'll do some experiments. > > Reading 1% (1267 read, 126733 skipped): 7748264us > Reading 2% (2609 read, 125391 skipped): 12672025us > Reading 5% (6502 read, 121498 skipped): 19005678us > Reading 5% (6246 read, 121754 skipped): 18509770us > Reading 10% (12975 read, 115025 skipped): 19305446us > Reading 20% (25716 read, 102284 skipped): 18147151us > Reading 50% (63656 read, 64344 skipped): 18089229us > Reading 100% (128000 read, 0 skipped): 18173003us > > These numbers don't make much sense to me. It seems like 5% is about as slow > as reading the whole file which is even worse than I expected. I thought I was > being a bit pessimistic to think reading 5% would be as slow as reading 20% of > the table. Just to put a few rumours to bed: - the current code does *not* use block sampling, it uses random row sampling. (There is a part of the code that selects the "next block" but that should not confuse you into thinking that the whole block is sampled). - yes, the random sampling is random - please read the code and comments - yes, I would expect the results you get. If you sample 5% of rows and each block has on average at least 20 rows, then we should expect the majority of blocks to be hit. Best Regards, Simon Riggs
В списке pgsql-hackers по дате отправления: