bitmap-index-scan faster than seq-scan on full-table-scan (gin index)
От | Jesper Krogh |
---|---|
Тема | bitmap-index-scan faster than seq-scan on full-table-scan (gin index) |
Дата | |
Msg-id | 4C041388.50605@krogh.cc обсуждение исходный текст |
Ответы |
Re: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Hi. The test data a set of generated terms using this perl-script http://shrek.krogh.cc/~jesper/build-test.pl and http://shrek.krogh.cc/~jesper/words.txt I have generated a test dataset with an average tsvector length of around 250 and 200.000 tuples in the dataset. Conceptually searching for the "full dataset" would always be fastest solved by a seq-scan. The query planner enforces this so much, so not even "enable_seqscan=off" can convince it to to something else. So in the next two explain analyze I compare a query searching 99% of the table up with a seqscan. The 98% case is enforced to be a "bitmap-index-scan" I would expect the runtime of the seqscan to be shortest and the bitmap-index-scan to be quite a lot larger, due to "random access" and the fact that the index-data also needs to be read in from disk. Bot runs are run with a freshly started postgresql backend and "echo 3 > /proc/sys/vm/drop_caches" so the os caching should not come into play. ftstest=# EXPLAIN ANALYZE select id from ftstest where body_fts @@ to_tsquery('commonterm98'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- BitmapHeap Scan on ftstest (cost=6579.81..992733.57 rows=195976 width=4) (actual time=4813.258..7081.277 rows=195976 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm98'::text)) -> Bitmap Index Scan on ftstest_gist_idx (cost=0.00..6530.82 rows=195976 width=0) (actual time=4787.513..4787.513 rows=195976 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm98'::text))Total runtime: 7389.346 ms (5 rows) ftstest=# set enable_bitmapscan = off; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processingthe request. The connection to the server was lost. Attempting reset: Succeeded. ftstest=# set enable_bitmapscan = off; SET ftstest=# EXPLAIN ANALYZE select id from ftstest where body_fts @@ to_tsquery('commonterm98'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- SeqScan on ftstest (cost=0.00..1006314.00 rows=195976 width=4) (actual time=96.077..60092.080 rows=195976 loops=1) Filter: (body_fts @@ to_tsquery('commonterm98'::text)) Total runtime:60436.556 ms (3 rows) So searching the full table via a bitmap-index-scan is actually 9 times cheaper than a seq-scan. (same on 9.0b1 and 8.4). Digging more into it reveals that the body_fts tsvector is indeed needed for the "filter" in the SeqScan. The tsvector data is stored in a TOAST table and the in the bitmap-index-scan case it only needs to read in the main table for checking visibillity. In the end it translates to reading in 1.4GB of TOAST-data vs. reading in 34MB of table data. Thinking a bit, then I dont think this is a particular rare case, allthough the ratio between the tables may be a real cornercase. The ratio is not 1:33 in the dataset that looks like the production dataset, but more 1:10, but in all cases in "production" there would be a much higher cache-hit ratio on the gin-index and the main table pages than on the TOAST table, so even with a ratio of 1:1 there most likely would be a real-world benefit. Would it be possible to implement the "Filtering" using the gin-index and a subsequent visibillity-check as on the index-scan? The same end up being the case for queries ordered by btree indexes and "filtered" by gin-indexes. Jesper -- Jesper
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Dimitri FontaineДата:
Сообщение: Re: functional call named notation clashes with SQL feature