Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: ANALYZE sampling is too good
Дата	11 декабря 2013 г. 19:04:03
Msg-id	16676.1386777819@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: ANALYZE sampling is too good (Greg Stark <stark@mit.edu>)
Ответы	Re: ANALYZE sampling is too good
Список	pgsql-hackers

Дерево обсуждения

Greg Stark <stark@mit.edu> writes:
> So I've done some empirical tests for a table generated by:
> create table sizeskew as (select i,j,repeat('i',i) from
> generate_series(1,1000) as i, generate_series(1,1000) as j);

> I find that using the whole block doesn't cause any problem with the
> avg_width field for the "repeat" column.That does reinforce my belief
> that we might not need any particularly black magic here.

> It does however cause a systemic error in the histogram bounds. It
> seems the median is systematically overestimated by more and more the
> larger the number of rows per block are used:

Hm.  You can only take N rows from a block if there actually are at least
N rows in the block.  So the sampling rule I suppose you are using is
"select up to N rows from each sampled block" --- and that is going to
favor the contents of blocks containing narrower-than-average rows.

Now in this case, it looks like that ought to favor rows with *smaller* i
values, but you say the median goes up not down.  So I'm not sure what's
going on.  I thought at first that TOAST compression might be part of the
explanation, but TOAST shouldn't kick in on rows with raw representation
narrower than 2KB.

Did you do a run with no upper limit on the number of rows per block?
Because I'm not sure that tests with a limit in place are a good guide
to what happens without it.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 11 декабря 2013 г., 19:01:21
Сообщение: Re: Why the buildfarm is all pink

Следующее

От: Tom Lane
Дата: 11 декабря 2013 г., 19:07:15
Сообщение: Re: [PATCH] Add transforms feature

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good

Предыдущее

Следующее