Re: Improving count(*)
| От | Simon Riggs | 
|---|---|
| Тема | Re: Improving count(*) | 
| Дата | |
| Msg-id | 1132272483.4959.299.camel@localhost.localdomain обсуждение исходный текст | 
| Ответ на | Re: Improving count(*) ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) | 
| Ответы | Re: Improving count(*) Re: Improving count(*) Re: Improving count(*) | 
| Список | pgsql-hackers | 
On Thu, 2005-11-17 at 16:30 -0600, Kevin Grittner wrote: > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > Server) the leaf level of the narrowest index on the table is scanned, > following a linked list of leaf pages. Leaf pages can be pretty dense > under Sybase, because they do use prefix compression. A count(*) > on a table with 100 million rows is going to take a few minutes, but it > is going to be at least an order of magnitude faster than a data page > scan -- maybe two orders of magnitude faster. > > What I don't understand is why people need to do such things so > frequently that it's a major issue, rather than an occassional > annoyance. Agreed, completely. (And it galls me to agree with multiple, potentially opposed opinions on my own thread). The trouble is, people moan and constantly. Perhaps we should stick to our guns and say, why do you care? From here, I think we should say, "show me an application package that needs this so badly we'll change PostgreSQL just for them". Prove it and we'll do it. Kinda polite in the TODO, but I think we should put something in there that says "things we haven't yet had any good reason to improve". > A solution which not only helped the count(*) issue > but also allowed index scans to skip the trip to the data page to > see if it's an active version seems like it would boost performance > overall. As pointed out elsewhere, it could also allow new > techniques for vacuum which could be beneficial. > > My view is that when tables get so big that a count(*) takes that > much time, you don't typiclally need an EXACT count anyway -- > you could normally check the statistics from your nightly analyze. Amen. >From here, another proposal. We have a GUC called count_uses_estimate that is set to off by default. If set to true, then a count(*) will use the planner logic to estimate number of rows in the table and return that as the answer, rather than actually count the row. Unless analyze statistics are not available, in which case it does the real count. Best Regards, Simon Riggs
В списке pgsql-hackers по дате отправления: