index-only scans
От | Robert Haas |
---|---|
Тема | index-only scans |
Дата | |
Msg-id | CA+Tgmoae-hfNL7eK3BHCeVXTaoU+n1n0jsHSrSLg-K2Kp9PqLg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: index-only scans
("Greg Sabino Mullane" <greg@turnstep.com>)
Re: index-only scans (Cédric Villemain <cedric.villemain.debian@gmail.com>) Re: index-only scans (Oleg Bartunov <oleg@sai.msu.su>) Re: index-only scans (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) Re: index-only scans (Greg Smith <greg@2ndQuadrant.com>) Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>) Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>) Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Please find attached a patch implementing a basic version of index-only scans. This patch is the work of my colleague Ibrar Ahmed and myself, and also incorporates some code from previous patches posted by Heikki Linnakanagas. I'm able to demonstrate a significant performance benefit from this patch, even in a situation where it doesn't save any disk I/O, due to reduced thrashing of shared_buffers. Configuration settings: max_connections = 100 shared_buffers = 400MB maintenance_work_mem = 256MB synchronous_commit = off checkpoint_segments = 100 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 checkpoint_warning = 90s seq_page_cost = 1.0 random_page_cost = 1.0 effective_cache_size = 3GB Test setup: pgbench -i -s 50 create table sample_data as select (random()*5000000)::int as aid, repeat('a', 1000) as filler from generate_series(1,100000); Test queries: select sum(aid) from sample_data a1 where exists (select * from pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567); select sum(aid) from sample_data a1 where exists (select * from pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567); On my laptop, the first query executes in about 555 ms, while the second one takes about 1125 ms. Inspection via pg_buffercache reveals that the second one thrashes shared_buffers something fierce, while the first one does not. You can actually get the time for the first query down to about 450 ms if you can persuade PostgreSQL to cache the entire sample_data table - which is difficult, due the BufferAccessStrategy stuff - and as soon as you run the second query, it blows the table out of cache, so practically speaking you're not going to get that faster time very often. I expect that you could get an even larger benefit from this type of query if you could avoid actual disk I/O, rather than just buffer cache thrashing, but I haven't come up with a suitable test cases for that yet (volunteers?). There are several things about this patch that probably need further thought and work, or at least some discussion. 1. The way that nodeIndexscan.c builds up the faux heap tuple is perhaps susceptible to improvement. I thought about building a virtual tuple, but then what do I do with an OID column, if I have one? Or maybe this should be done some other way altogether. 2. Suppose we scan one tuple on a not-all-visible page followed by 99 tuples on all-visible pages. The code as written will hold the pin on the first heap page for the entire scan. As soon as we hit the end of the scan or another tuple where we have to actually visit the page, the old pin will be released, but until then we hold onto it. This isn't totally stupid: the next tuple that's on a not-all-visible page could easily be on the same not-all-visible page we already have pinned. And in 99% cases holding the pin for slightly longer is probably completely harmless. On the flip side, it could increase the chances of interfering with VACUUM. Then again, a non-index-only scan would still interfere with the same VACUUM, so maybe we don't care. 3. The code in create_index_path() builds up a bitmapset of heap attributes that get used for any purpose anywhere in the query, and hangs it on the RelOptInfo so it doesn't need to be rebuilt for every index under consideration. However, if it were somehow possible to have the rel involved without using any attributes at all, we'd rebuild the cache over and over, since it would never become non-NULL. I don't think that can happen right now, but future planner changes might make it possible. 4. There are a couple of cases that use index-only scans even though the EXPLAIN output sort of makes it look like they shouldn't. For example, in the above queries, an index-only scan is chosen even though the query does "SELECT *" from the table being scanned. Even though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems that the target list of an EXISTS query is in fact discarded, e.g.: create or replace function error() returns int as $$begin select 1/0; end$$ language plpgsql; select * from pgbench_accounts a where exists (select error() from pgbench_branches b where b.bid = a.aid); -- doesn't error out! Along similar lines, COUNT(*) does not preclude an index-only scan, because the * is apparently just window dressing. You'll still get just a seq-scan unless you have an indexable qual in the query somewhere, because... 5. We haven't made any planner changes at all, not even for cost estimation. It is not clear to me what the right way to do cost estimation here is. It seems that it would be useful to know what percentage of the table is all-visible at planning time, but even if we did, there could (and probably often will) be correlation between the pages the query is interested in and which visibility map bits are set. So I'm concerned about being overly optimistic about the cost savings. Also, there's the problem of figuring out how to keep the percent-of-table-that-has-the-vm-bits-set statistic up to date. Maybe that's a job for ANALYZE but I haven't thought about it much yet. 6. I'm sure there are probably some statements in the documentation that need updating, but I haven't tracked 'em down yet. Comments, testing, review appreciated... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Jeff DavisДата:
Сообщение: Re: Extra check in 9.0 exclusion constraint unintended consequences