Re: procost for to_tsvector
От | Andrew Gierth |
---|---|
Тема | Re: procost for to_tsvector |
Дата | |
Msg-id | 87h9trs0zm.fsf@news-spur.riddles.org.uk обсуждение исходный текст |
Ответ на | Re: procost for to_tsvector (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: procost for to_tsvector
(Jeff Janes <jeff.janes@gmail.com>)
|
Список | pgsql-hackers |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Nyet ... at least not without you actually making that argument,Tom> with numbers, rather than just handwaving. Weuse 100 for plpgsqlTom> and suchlike functions. I'd be OK with making it 10 just onTom> general principles, but claimingthat it's as expensive as aTom> plpgsql function requires evidence. [TL/DR: 10 isn't enough, even 100 may be too low] On a text corpus consisting of ~18 thousand blog comments + ~5% of dead rows, median length 302 bytes, only about 3% long enough to be toasted, and selecting a common word (~22% of the table): explain analyze select * from comments where to_tsvector('english',message) @@ '''one'''::tsquery; QUERY PLAN ----------------------------------------------------------------------------------------------------------------Seq Scanon comments (cost=0.00..2406.18 rows=4140 width=792) (actual time=0.601..3946.589 rows=4056 loops=1) Filter: (to_tsvector('english'::regconfig,message) @@ '''one'''::tsquery) Rows Removed by Filter: 14310Planning time: 0.270 msExecutiontime: 3954.745 ms (5 rows) QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------Bitmap HeapScan on comments (cost=204.09..2404.30 rows=4140 width=792) (actual time=2.401..11.564 rows=4056 loops=1) Recheck Cond:(to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery) Heap Blocks: exact=1911 -> Bitmap Index Scan oncomments_to_tsvector_idx (cost=0.00..203.05 rows=4140 width=0) (actual time=1.974..1.974 rows=4313 loops=1) IndexCond: (to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery)Planning time: 0.278 msExecution time: 17.640ms (7 rows) (strangely, the seqscan plan is picked despite having a cost more than a point higher? what's up with that?) So for two plans with virtually identical cost, we have an execution time difference on the order of 200x. We can rule out the performance of the @@ by using a precalculated tsvector: explain analyze select * from comments where tsv @@ '''one'''::tsquery; QUERYPLAN --------------------------------------------------------------------------------------------------------------Seq Scan oncomments (cost=0.00..2359.31 rows=4140 width=792) (actual time=0.023..47.746 rows=4056 loops=1) Filter: (tsv @@ '''one'''::tsquery) Rows Removed by Filter: 14310Planning time: 0.262 msExecution time: 54.220 ms (5 rows) So we're looking at an execution time for to_tsvector on the order of 200us, which is a seriously big deal when looking at a potential seqscan. That's not just _as_ expensive as a plpgsql function, but more than 50 times as expensive as a simple one like this: create function f1(text) returns integer language plpgsqlas $f$ begin return length($1); end; $f$; select sum(length(message)) from comments; -- 89ms select sum(f1(message)) from comments; -- 155ms 66ms difference divided by 18366 rows = 3.6us per call Now, obviously the default cost for plpgsql functions is assuming that the function is a whole lot more complex than that, so one wouldn't argue that to_tsvector should cost 5000. But there's a strong case for arguing that it should cost a whole lot more than 100, because even at that value the relative costs for the first two plans in this post only differ by 2x, compared to a 200x runtime difference. A value of 10 would be inadequate in many cases; in this example it leaves the slower plan with a cost only ~15% higher, which is way too close to be comfortable. (As another example, a function with a simple query in it, such as obj_description, can have runtimes on the order of 40us, still 5x faster than to_tsvector.) -- Andrew (irc:RhodiumToad)
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Jim NasbyДата:
Сообщение: Re: proposal: searching in array function - array_position