Re: PostgreSQL GIN index not used when ts_query language is fetched from a column

Поиск
Список
Период
Сортировка
От Jaap Roes
Тема Re: PostgreSQL GIN index not used when ts_query language is fetched from a column
Дата
Msg-id C8E95F2B-BF9E-47A3-9965-7750043C9822@leukeleu.nl
обсуждение исходный текст
Ответ на PostgreSQL GIN index not used when ts_query language is fetched from a column  (Jaap Roes <jroes@leukeleu.nl>)
Список pgsql-general
> On 15 Sep 2016, at 14:46, Jaap Roes <jroes@leukeleu.nl> wrote:
>
> I've got a table that stores some multilingual content:
>
>    CREATE TABLE search (
>      content text NOT NULL,
>      language regconfig NOT NULL,
>      fulltext tsvector
>    );
>    CREATE INDEX search_fulltext ON search USING GIN(fulltext);
...
> To make sure I always search in the correct language I use these queries:
>
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
>    (1 row)
>
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
>    (1 row)
>
> Because hardcoding the language doesn't give the correct results:
>
>    SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
>    (0 rows)
>
>    SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
>    (0 rows)
>
> The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does
asequential scan: 
...
>    EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine’);
>
>    Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
>        Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
>        Rows Removed by Filter: 5
>    Planning time: 0.039 ms
>    Execution time: 0.064 ms
>
> While it does when hardcoding a language:
>
>    EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'vlieg');
>    Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual time=0.044..0.044 rows=1 loops=1)
>      Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
>      Heap Blocks: exact=1
>        ->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 width=0) (actual time=0.037..0.037 rows=1
loops=1)
>          Index Cond: (fulltext @@ '''vlieg'''::tsquery)
>    Planning time: 0.128 ms
>    Execution time: 0.065 ms
>
> So my question is: Is it at all possible to use a column in the ts_query to use the correct language config and still
havePostgres use the GIN index? 

So I came up with a solution. Pre-localise the query in a join and use that to filter the matches:

    SELECT * FROM search s
    INNER JOIN (
        SELECT 'dutch'::regconfig AS language, to_tsquery('dutch', 'shine') as q
        UNION SELECT 'english'::regconfig AS language, to_tsquery('english', 'shine') as q
        UNION SELECT 'simple'::regconfig AS language, to_tsquery('simple', 'shine') as q
    ) q ON (s.language=q.language)
    WHERE fulltext @@ q;

This seems to work, but the query plan looks a bit confusing, so I’m not super confident about the correctness:

    Nested Loop  (cost=205.44..1327.12 rows=188 width=1590) (actual time=3.350..7.010 rows=16 loops=1)
      ->  Unique  (cost=0.08..0.11 rows=3 width=0) (actual time=0.010..0.021 rows=3 loops=1)
            ->  Sort  (cost=0.08..0.09 rows=3 width=0) (actual time=0.008..0.011 rows=3 loops=1)
                  Sort Key: ('dutch'::regconfig), ('''vlieg'''::tsquery)
                  Sort Method: quicksort  Memory: 25kB
                  ->  Append  (cost=0.00..0.06 rows=3 width=0) (actual time=0.001..0.002 rows=3 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
      ->  Bitmap Heap Scan on search s  (cost=205.36..441.70 rows=63 width=1554) (actual time=2.310..2.318 rows=5
loops=3)
            Recheck Cond: ((fulltext @@ ('''vlieg'''::tsquery)) AND ((language)::oid = (('dutch'::regconfig))::oid))
            Heap Blocks: exact=16
            ->  BitmapAnd  (cost=205.36..205.36 rows=63 width=0) (actual time=2.303..2.303 rows=0 loops=3)
                  ->  Bitmap Index Scan on search_fulltext  (cost=0.00..17.41 rows=188 width=0) (actual
time=0.018..0.018rows=16 loops=3) 
                        Index Cond: (fulltext @@ ('''vlieg'''::tsquery))
                  ->  Bitmap Index Scan on search_language  (cost=0.00..187.67 rows=12539 width=0) (actual
time=2.277..2.277rows=12539 loops=3) 
                        Index Cond: ((language_config)::oid = (('dutch'::regconfig))::oid)
    Planning time: 0.228 ms
    Execution time: 7.058 ms

Is this the way to go? Or is there a better way, I’m eager to find out!



В списке pgsql-general по дате отправления:

Предыдущее
От: Leonardo M. Ramé
Дата:
Сообщение: Re: Test letter
Следующее
От: carpenterc17
Дата:
Сообщение: Testers Needed For New Database Monitoring Platform