Re: statistics
От | Paul Thomas |
---|---|
Тема | Re: statistics |
Дата | |
Msg-id | 20040408004835.A8476@bacon обсуждение исходный текст |
Ответ на | statistics ("Jaime Casanova" <el_vigia_ec@hotmail.com>) |
Список | pgsql-performance |
On 07/04/2004 22:05 Jaime Casanova wrote: > What the statistics are? Where can i view it? where can i find info > about its field and why are they valuable information to performance? > > thanx in advance, > > Jaime Casanova OK. An idiot's guide to statistics by a full-time idiot... Let's start with a simple premise. I'm a RDBMS (forget that I'm actually an idiot for a moment...) and I've been asked for select * from foo where bar = 7; How do I go about fulfilling the reequest in the most efficient manner? (i.e., ASAP!) One way might be to read through the whole table and return only those rows which match the where criteron - a sequential scan on the table. But wait a minute, there is an index on column bar. Could I use this instead? Well, of course, I could use it but I have to keep sight of the goal of returning the data ASAP and I know that the act of reading index/reading table/... will have a performance penalty due to a lot more head movement on the disk. So how do I make chose between a sequential scan and an index scan? Let's lokk at a couple of extreme scenarios: 1) let's look at the condition where all or virtually all of the bar columns are populated wityh the value 7. In this case it would be more efficient to read sequentially through the table. 2) the opposite of (1) - very few of the bar columns have the value 7. In this case using the index could be a winner. So generalising, I need to be able to estimate whether doing a sequential scan is more efficient that an index scan and this comes down to 2 factors: a) the cost of moving the disk heads all over the place (random page cost) b) the spread of values in the selecting column(s) (a) is specfified in postgresql.conf (see archives for much discusion about what the value should be..) (b) is determined by the dastardly trick of actually sampling the data in the table!!! That's what analyze does. It samples your table(s) and uses the result to feeede into it's descision about when to flip between sequential and index scans. Hope this makes some kind of sense... -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
В списке pgsql-performance по дате отправления: