Optimizer difference using function index between 7.3 and 7.4
От | Jeff Boes |
---|---|
Тема | Optimizer difference using function index between 7.3 and 7.4 |
Дата | |
Msg-id | 403383A6.7090101@nexcerpt.com обсуждение исходный текст |
Ответы |
Re: Optimizer difference using function index between 7.3 and 7.4
Re: Optimizer difference using function index between 7.3 and 7.4 |
Список | pgsql-performance |
We have a large (several million row) table with a field containing URLs. Now, funny thing about URLs: they mostly start with a common substring ("http://www."). But not all the rows start with this, so we can't just lop off the first N characters. However, we noticed some time ago that an index on this field wasn't as effective as an index on the REVERSE of the field. So ... CREATE OR REPLACE FUNCTION fn_urlrev(text) returns text as ' return reverse(lc($_[0])) ' language 'plperl' with (iscachable,isstrict); and then CREATE UNIQUE INDEX ix_links_3 ON links (fn_urlrev(path_base)); seemed to be much faster. When we have to look up a single entry in "links", we do so by something like -- SELECT * FROM links WHERE fn_urlrev(path_base) = ?; and it's rather fast. When we have a bunch of them to do, under 7.3 we found it useful to create a temporary table, fill it with reversed URLs, and join: INSERT INTO temp_link_urls VALUES (fn_urlrev(?)); SELECT l.path_base,l.link_id FROM links l JOIN temp_link_urls t ON (fn_urlrev(l.path_base) = t.rev_path_base); Here are query plans from the two versions (using a temp table with 200 rows, after ANALYZE on the temp table): 7.3: # explain select link_id from links l join clm_tmp_links t on (fn_urlrev(l.path_base) = t.rev_path_base); QUERY PLAN ----------------------------------------------------------------------------------------- Nested Loop (cost=0.00..3936411.13 rows=2000937 width=152) -> Seq Scan on clm_tmp_links t (cost=0.00..5.00 rows=200 width=74) -> Index Scan using ix_links_3 on links l (cost=0.00..19531.96 rows=10005 width=78) Index Cond: (fn_urlrev(l.path_base) = "outer".rev_path_base) (4 rows) 7.4: # explain select link_id from links l join clm_tmp_links t on (fn_urlrev(l.path_base) = t.rev_path_base); QUERY PLAN ------------------------------------------------------------------------------ Hash Join (cost=5.50..88832.88 rows=1705551 width=4) Hash Cond: (fn_urlrev("outer".path_base) = "inner".rev_path_base) -> Seq Scan on links l (cost=0.00..50452.50 rows=1705550 width=78) -> Hash (cost=5.00..5.00 rows=200 width=74) -> Seq Scan on clm_tmp_links t (cost=0.00..5.00 rows=200 width=74) (5 rows) Although the cost for the 7.4 query is lower, the 7.3 plan executes in about 3 seconds, while the 7.4 plan executes in 59.8 seconds! Now the odd part: if I change the query to this: # explain analyze select link_id from links l join clm_tmp_links t on (fn_urlrev(l.path_base) = fn_urlrev(t.rev_path_base)); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=12.64..219974.16 rows=1705551 width=4) (actual time=17.928..17.928 rows=0 loops=1) Merge Cond: (fn_urlrev("outer".path_base) = "inner"."?column2?") -> Index Scan using ix_links_3 on links l (cost=0.00..173058.87 rows=1705550 width=78) (actual time=0.229..0.285 rows=7 loops=1) -> Sort (cost=12.64..13.14 rows=200 width=74) (actual time=9.652..9.871 rows=200 loops=1) Sort Key: fn_urlrev(t.rev_path_base) -> Seq Scan on clm_tmp_links t (cost=0.00..5.00 rows=200 width=74) (actual time=0.166..5.753 rows=200 loops=1) Total runtime: 18.125 ms (i.e., apply the function to the data in the temp table), it runs a whole lot faster! Is this a bug in the optimizer? Or did something change about the way functional indexes are used? -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
В списке pgsql-performance по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: Optimizer difference using function index between 7.3 and 7.4