Re: Much Ado About COUNT(*)
От | Jeff Davis |
---|---|
Тема | Re: Much Ado About COUNT(*) |
Дата | |
Msg-id | 1105564677.2886.380.camel@jeff обсуждение исходный текст |
Ответ на | Re: Much Ado About COUNT(*) ("Jonah H. Harris" <jharris@tvi.edu>) |
Ответы |
Re: Much Ado About COUNT(*)
("Jonah H. Harris" <jharris@tvi.edu>)
|
Список | pgsql-hackers |
> > > We seem to be in agreement. I'm looking for faster/smarter access to > data, not the monetary cost of doing so. Isn't it faster/smarter to > satisfy a query with the index rather than sequentially scanning an > entire relation if it is possible? > You have to scan every tuple's visibility information no matter what. Sequential i/o is fastest (per byte), so the most efficient possible method is seqscan. Unfortunately for count(*), the tables also store columns, which are really just clutter as you're moving through the table looking for visibility information. Indexes are designed for searches, not exhaustive retrieval of all records. If you can store that visibility information in a seperate place so that it's not cluttered by unneeded attributes that could be helpful, but an index is not the place for that. If you store that in the index, you are really imposing a new cost on yourself: the cost to do random i/o as you're jumping around the index trying to access every entry, plus the index metadata. You could make a narrow table and join with the other attributes. That might be a good place that wouldn't clutter up the visibility information much (it would just need a primary key). A seqscan over that would be quite efficient. > If this is such a bad idea, why do other database systems use it? As a > businessperson myself, it doesn't seem logical to me that commercial > database companies would spend money on implementing this feature if it > wouldn't be used. Remember guys, I'm just trying to help. > Some databases use an internal counter to count rows as they are added/deleted. This does not give accurate results in a database that supports ACID -- more specifically a database that implements the "isolation" part of ACID. Two concurrent transactions, if the database supports proper isolation, could have two different results for count(*) and both would be correct. That makes all the "optimized count(*)" databases really just give a close number, not the real number. If you just want a close number, there are other ways of doing that in PostgreSQL that people have already mentioned. If you know of a database that supports proper isolation and also has a faster count(*) I would be interested to know what it is. There may be a way to do it without sacrificing in other areas, but I don't know what it is. Does someone know exactly what oracle actually does? Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: