Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Вложения
В списке pgsql-hackers по дате отправления:
| От | John A Meinel |
|---|---|
| Тема | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
| Дата | |
| Msg-id | 42781B1D.7070101@arbash-meinel.com обсуждение исходный текст |
| Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Josh Berkus <josh@agliodbs.com>) |
| Список | pgsql-hackers |
Josh Berkus wrote: > Mischa, > > >>Okay, although given the track record of page-based sampling for >>n-distinct, it's a bit like looking for your keys under the streetlight, >>rather than in the alley where you dropped them :-) > > > Bad analogy, but funny. > > The issue with page-based vs. pure random sampling is that to do, for example, > 10% of rows purely randomly would actually mean loading 50% of pages. With > 20% of rows, you might as well scan the whole table. > > Unless, of course, we use indexes for sampling, which seems like a *really > good* idea to me .... > But doesn't an index only sample one column at a time, whereas with page-based sampling, you can sample all of the columns at once. And not all columns would have indexes, though it could be assumed that if a column doesn't have an index, then it doesn't matter as much for calculations such as n_distinct. But if you had 5 indexed rows in your table, then doing it index wise means you would have to make 5 passes instead of just one. Though I agree that page-based sampling is important for performance reasons. John =:->
В списке pgsql-hackers по дате отправления:
Сайт использует файлы cookie для корректной работы и повышения удобства. Нажимая кнопку «Принять» или продолжая пользоваться сайтом, вы соглашаетесь на их использование в соответствии с Политикой в отношении обработки cookie ООО «ППГ», в том числе на передачу данных из файлов cookie сторонним статистическим и рекламным службам. Вы можете управлять настройками cookie через параметры вашего браузера