Re: Proposal: COUNT(*) (and related) speedup
От | Robert Haas |
---|---|
Тема | Re: Proposal: COUNT(*) (and related) speedup |
Дата | |
Msg-id | CA+TgmoY-wi6oYkd6uxG1Mb0e1=Gan7_mGu4PRMTwG0cgw2ni2g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposal: COUNT(*) (and related) speedup (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Fri, Apr 4, 2014 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joshua Yanovski <pythonesque@gmail.com> writes: >>> But worse, what happens if a count(*) >>> is in progress? It might or might not have scanned this page already, >>> and there's no way to get the right answer in both cases. Counter >>> updates done by VACUUM would have a similar race-condition problem. > >> I don't think the first part really matters. If the page was visible >> when COUNT(*) started and then got dirtied by some other transaction, >> any changes by the second transaction shouldn't alter the actual count >> in our transaction anyway, so whether we scan the page needlessly or >> not seems beside the point. > > The question is not whether you scan a page "needlessly" or not, it's > whether you double-count the tuples on it. I don't think it's possible to > be sure that when you add the central counter value into your local sum, > you are neither double-counting nor omitting pages whose all-visible state > changed while you were scanning the table. I think it would work if you also had information about which pages you needed to scan. For example, suppose you had a data structure sorta like the visibility map, except that instead of one bit per page you have one 16-bit integer per page. The integer is -1 if the page isn't known to be all-visible, and is otherwise the count of tuples on the page. Now, we can do a count(*) as follows: 1. Lock the first as-yet-unexamined page of the data structure. 2. Add all the values that aren't -1 to your running count of tuples, and remember the page numbers corresponding to the remaining entries. 3. Unlock the page you locked in step 1. 4. Visit each heap page you remembered in step 2 and add the number of tuples visible to your scan to your running count of tuples. 5. Provided there's a page of the data structure you haven't visited yet, go to step 1. On the write side, any operation that clears the visibility map bit must also set the entry for that page to -1. When the page is all-visible, the value can be set to the total number of tuples on the page. This is safe because, even if the page ceases to be all-visible after we add its tuples to the count, the new tuples that were added aren't visible to our scan, and any tuples deleted are still visible to our scan - because the transaction making the changes, in either case, is obviously still running, and therefore didn't commit before we took our snapshot. Now, this wouldn't be O(1) but it would presumably be quite fast if your visibility map bits are mostly set. One 8kB page of 16-bit counters would cover 32MB of heap. The bad news is that I am pretty sure there's no easy way for an index AM to get callbacks at the right time, so it's really unclear how something like this would fit in with our existing abstractions. And even if we answered that question, it would be a lot of work for a pretty narrow use case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: