Обсуждение: Re: Planner selects different execution plans depending on limit

Поиск
Список
Период
Сортировка

Re: Planner selects different execution plans depending on limit

От
Bill Martin
Дата:

On 10/09/12 16:24, bill_martin@freenet.de wrote:

Hi All

I´ve ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );

 
If I´m seaching for a word which is NOT in the column content the query plan and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few milliseconds.
 
core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true
Limit 3927

"Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1)"
"  ->  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1)"
"        Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"
"Total runtime: 52147.173 ms"

Is there any posibility to improve the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan?


The big hammer is: "set enable_seqscan = off", but if you tell which PG version you're on there may be something to do. I suggest you'd start by bumping the statistics target for the column to 10000 and run analyze to see what that changes.

--
Jesper

 

Hi,

my email client delete a lot of the content of the original thread message. Here is the full content:

 

Hi All

 

I´ve created following table which contains one million records.

 

CREATE TABLE core_content

(

  id bigint NOT NULL,

  content text NOT NULL,

  short_content text,

  CONSTRAINT core_content_pkey PRIMARY KEY (id )

)

 

CREATE INDEX ft_simple_core_content_content_idx

  ON core_content

  USING gin

  (to_tsvector('simple'::regconfig, content) );

 

 

If I´m seaching for a word which is not in the column content the query plan and the execution time differs with the given limit.

If I choose 3927 or any higher number the query execution took only few milliseconds.

  

select * from core_content content where

to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true

Limit 3927

 

"Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 rows=0 loops=1)"

"  ->  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)"

"        Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"

"        ->  Bitmap Index Scan on ft_simple_core_content_content_idx  (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 loops=1)"

"              Index Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"

"Total runtime: 0.277 ms"

 

If I choose 3926 or any lower number (e.g. 10) the query execution took more than fifty seconds.

 

select * from core_content content where

to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true

Limit 3927

 

"Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1)"

"  ->  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1)"

"        Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"

"Total runtime: 52147.173 ms"

 

Is there any posibility to tune up the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes

only the fast bitmap heap scan instead of the slow seq scan?

 

I use PostgreSQL 9.1.5.; Intel i5-2400 @ 3.1 GHz, 16GB; Windows 7 64 Bit

 

Regards,

Bill Martin

 

Re: Planner selects different execution plans depending on limit

От
Tom Lane
Дата:
Bill Martin <bill.martin@communote.com> writes:
> I´ve created following table which contains one million records.
> ...

> "Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 rows=0 loops=1)"
> "  ->  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 rows=20011 width=621) (actual
time=0.254..0.254rows=0 loops=1)" 
> "        Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"
> "        ->  Bitmap Index Scan on ft_simple_core_content_content_idx  (cost=0.00..10086.09 rows=20011 width=0)
(actualtime=0.251..0.251 rows=0 loops=1)" 
> "              Index Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery)"
> "Total runtime: 0.277 ms"

> Is there any posibility to tune up the performance even if the limit is only 10?

The problem is the way-off rowcount estimate (20011 rows when it's
really none); with a smaller estimate there, the planner wouldn't decide
to switch to a seqscan.

Did you take the advice to increase the column's statistics target?
Because 20011 looks suspiciously close to the default estimate that
tsquery_opr_selec will fall back on if it hasn't got enough stats
to come up with a trustworthy estimate for a *-pattern query.

(I think there are probably some bugs in tsquery_opr_selec's estimate
for this, as I just posted about on pgsql-hackers.  But this number
looks like you're not even getting to the estimation code, for lack
of enough statistics entries.)

The other thing that seems kind of weird here is that the cost estimate
for the bitmap index scan seems out of line even given the
20000-entries-to-fetch estimate.  I'd have expected a cost estimate of a
few hundred for that, not 10000.  Perhaps this index is really bloated,
and it's time to REINDEX it?

            regards, tom lane