ANALYZE sampling is too good
От | Greg Stark |
---|---|
Тема | ANALYZE sampling is too good |
Дата | |
Msg-id | CAM-w4HOjRbNPMW=SHjHw_Qfapcuu5Ege1tMdR0ZQU+kqX8Qeug@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: ANALYZE sampling is too good
(Claudio Freire <klaussfreire@gmail.com>)
Re: ANALYZE sampling is too good (Robert Haas <robertmhaas@gmail.com>) Re: ANALYZE sampling is too good (Jeff Janes <jeff.janes@gmail.com>) Re: ANALYZE sampling is too good (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
At multiple conferences I've heard about people trying all sorts of gymnastics to avoid ANALYZE which they expect to take too long and consume too much I/O. This is especially a big complain after upgrades when their new database performs poorly until the new statistics are in and they did pg_upgrade to avoid an extended downtime and complain about ANALYZE taking hours. I always gave the party line that ANALYZE only takes a small constant-sized sample so even very large tables should be very quick. But after hearing the same story again in Heroku I looked into it a bit further. I was kind of shocked but the numbers. ANALYZE takes a sample of 300 * statistics_target rows. That sounds pretty reasonable but with default_statistics_target set to 100 that's 30,000 rows. If I'm reading the code right It takes this sample by sampling 30,000 blocks and then (if the table is large enough) taking an average of one row per block. Each block is 8192 bytes so that means it's reading 240MB of each table.That's a lot more than I realized. It means if your table is anywhere up to 240MB you're effectively doing a full table scan and then throwing out nearly all the data read. Worse, my experience with the posix_fadvise benchmarking is that on spinning media reading one out of every 16 blocks takes about the same time as reading them all. Presumably this is because the seek time between tracks dominates and reading one out of every 16 blocks is still reading every track. So in fact if your table is up to about 3-4G ANALYZE is still effectively going to do a full table scan, at least as far as I/O time goes. The current algorithm seems like it was designed with a 100G+ table in mind but the consequences on the more common 100M-100G tables weren't really considered. Consider what this means for partitioned tables. If they partition their terabyte table into 10 partitions ANALYZE will suddenly want to use 10x as much I/O which seems like a perverse consequence. I'm not sure I have a prescription but my general feeling is that we're spending an awful lot of resources going after a statistically valid sample when we can spend a lot less resources and get something 90% as good. Or if we're really going to read that much data that we might as well use more of the rows we find. -- greg
В списке pgsql-hackers по дате отправления: