ANALYZE sampling is too good

Поиск
Список
Период
Сортировка
От Greg Stark
Тема ANALYZE sampling is too good
Дата
Msg-id CAM-w4HOjRbNPMW=SHjHw_Qfapcuu5Ege1tMdR0ZQU+kqX8Qeug@mail.gmail.com
обсуждение исходный текст
Ответы Re: ANALYZE sampling is too good  (Claudio Freire <klaussfreire@gmail.com>)
Re: ANALYZE sampling is too good  (Robert Haas <robertmhaas@gmail.com>)
Re: ANALYZE sampling is too good  (Jeff Janes <jeff.janes@gmail.com>)
Re: ANALYZE sampling is too good  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
At multiple conferences I've heard about people trying all sorts of
gymnastics to avoid ANALYZE which they expect to take too long and
consume too much I/O. This is especially a big complain after upgrades
when their new database performs poorly until the new statistics are
in and they did pg_upgrade to avoid an extended downtime and complain
about ANALYZE taking hours.

I always gave the party line that ANALYZE only takes a small
constant-sized sample so even very large tables should be very quick.
But after hearing the same story again in Heroku I looked into it a
bit further. I was kind of shocked but the numbers.

ANALYZE takes a sample of 300 * statistics_target rows. That sounds
pretty reasonable but with default_statistics_target set to 100 that's
30,000 rows. If I'm reading the code right It takes this sample by
sampling 30,000 blocks and then (if the table is large enough) taking
an average of one row per block. Each block is 8192 bytes so that
means it's reading 240MB of each table.That's a lot more than I
realized.

It means if your table is anywhere up to 240MB you're effectively
doing a full table scan and then throwing out nearly all the data
read.

Worse, my experience with the posix_fadvise benchmarking is that on
spinning media reading one out of every 16 blocks takes about the same
time as reading them all. Presumably this is because the seek time
between tracks dominates and reading one out of every 16 blocks is
still reading every track. So in fact if your table is up to about
3-4G ANALYZE is still effectively going to do a full table scan, at
least as far as I/O time goes.

The current algorithm seems like it was designed with a 100G+ table in
mind but the consequences on the more common 100M-100G tables weren't
really considered. Consider what this means for partitioned tables. If
they partition their terabyte table into 10 partitions ANALYZE will
suddenly want to use 10x as much I/O which seems like a perverse
consequence.

I'm not sure I have a prescription but my general feeling is that
we're spending an awful lot of resources going after a statistically
valid sample when we can spend a lot less resources and get something
90% as good. Or if we're really going to read that much data that we
might as well use more of the rows we find.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Why we are going to have to go DirectIO
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Why we are going to have to go DirectIO