Обсуждение: SeqScan with full text search

Поиск
Список
Период
Сортировка

SeqScan with full text search

От
Tomek Walkuski
Дата:
Hello group!

I have query like this:

SELECT
  employments.candidate_id AS candidate_id,
  SUM(TS_RANK(employers.search_vector, TO_TSQUERY('simple', 'One:* |
Two:* | Three:* | Four:*'), 2)) AS ts_rank
FROM
  employments
INNER JOIN
  employers ON employments.employer_id = employers.id
AND
  employers.search_vector @@ TO_TSQUERY('simple', 'One:* | Two:* |
Three:* | Four:*')
GROUP BY
  candidate_id;

And it results with this:

http://explain.depesz.com/s/jLM

The JOIN between employments and employers is the culprit. I'm unable
to get rid of the seq scan, and setting enable_seqscan to off makes
things even worse.

Is there any way to get rid of this JOIN?

What info should I post to debug this easier?

Thanks!

Re: SeqScan with full text search

От
Tomas Vondra
Дата:
On 16.4.2012 16:02, Tomek Walkuski wrote:
> Hello group!
>
> I have query like this:
>
> SELECT
>   employments.candidate_id AS candidate_id,
>   SUM(TS_RANK(employers.search_vector, TO_TSQUERY('simple', 'One:* |
> Two:* | Three:* | Four:*'), 2)) AS ts_rank
> FROM
>   employments
> INNER JOIN
>   employers ON employments.employer_id = employers.id
> AND
>   employers.search_vector @@ TO_TSQUERY('simple', 'One:* | Two:* |
> Three:* | Four:*')
> GROUP BY
>   candidate_id;
>
> And it results with this:
>
> http://explain.depesz.com/s/jLM
>
> The JOIN between employments and employers is the culprit. I'm unable
> to get rid of the seq scan, and setting enable_seqscan to off makes
> things even worse.
>
> Is there any way to get rid of this JOIN?

Well, it's clearly the seqscan that takes most time, and it seems that
you really need to scan the whole table because you're asking 'for each
employment of all the candidates ...'

So you really need to scan all 1.6 million rows to get the result. And
seqscan is the best way to do that.

I don't see a way to remove the join and/or seqscan, unless you want to
keep a 'materialized view' maintained by a trigger or something ...

Another option is to make the employment table as small as possible
(e.g. removing columns that are not needed etc.) so that the seqscan is
faster.

>
> What info should I post to debug this easier?

1) structures of the tables
2) what amount of data are we talking about
3) was this the first run (with cold caches) or have  you run that
   several times?
4) basic system info (RAM, CPU, shared_buffers etc.)

Tomas

Re: SeqScan with full text search

От
Merlin Moncure
Дата:
On Mon, Apr 16, 2012 at 9:02 AM, Tomek Walkuski
<tomek.walkuski@gmail.com> wrote:
> Hello group!
>
> I have query like this:
>
> SELECT
>  employments.candidate_id AS candidate_id,
>  SUM(TS_RANK(employers.search_vector, TO_TSQUERY('simple', 'One:* |
> Two:* | Three:* | Four:*'), 2)) AS ts_rank
> FROM
>  employments
> INNER JOIN
>  employers ON employments.employer_id = employers.id
> AND
>  employers.search_vector @@ TO_TSQUERY('simple', 'One:* | Two:* |
> Three:* | Four:*')
> GROUP BY
>  candidate_id;
>
> And it results with this:
>
> http://explain.depesz.com/s/jLM
>
> The JOIN between employments and employers is the culprit. I'm unable
> to get rid of the seq scan, and setting enable_seqscan to off makes
> things even worse.
>
> Is there any way to get rid of this JOIN?

get rid of the join?  the seq scan is natural since it seems like
you're querying the whole table, right?  maybe if you explained the
problem in more detail (especially why you think the seq scan might
not be required)?

merlin

Re: SeqScan with full text search

От
Віталій Тимчишин
Дата:


2012/4/16 Tomek Walkuski <tomek.walkuski@gmail.com>
Hello group!

I have query like this:

SELECT
 employments.candidate_id AS candidate_id,
 SUM(TS_RANK(employers.search_vector, TO_TSQUERY('simple', 'One:* |
Two:* | Three:* | Four:*'), 2)) AS ts_rank
FROM
 employments
INNER JOIN
 employers ON employments.employer_id = employers.id
AND
 employers.search_vector @@ TO_TSQUERY('simple', 'One:* | Two:* |
Three:* | Four:*')
GROUP BY
 candidate_id;

And it results with this:

http://explain.depesz.com/s/jLM

The JOIN between employments and employers is the culprit. I'm unable
to get rid of the seq scan, and setting enable_seqscan to off makes
things even worse.

Is there any way to get rid of this JOIN?


Have you got an index on employments.employer_id? It seems for me that only some employments get out of join, so index would help here. What's the plan with seq_scan off?

P.S. I don't see why all employments are needed. May be I am reading something wrong? For me it's max 2616 employments out of 1606432.

Best regards, Vitalii Tymchyshyn

--
Best regards,
 Vitalii Tymchyshyn