Обсуждение: not using my GIN index in JOIN expression

Поиск
Список
Период
Сортировка

not using my GIN index in JOIN expression

От
Jean-Max Reymond
Дата:
I am running the last version of PostgreSQL 9.3.3
I have two tables detm and corm and a lot of datas in the column
cormdata of corm table (1.4 GB).

I have a GIN index on cormdata:
CREATE INDEX ix_corm_fulltext_cormdata  ON corm
   USING gin (to_tsvector('french'::regconfig, cormdata))
   WHERE cormishtml IS FALSE AND length(cormdata) < 20000;

select distinct b.detmmailid from corm b where
(to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
is very fast and use the GIN index.

"HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)"
"  ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548 width=52)"
"        Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
(length(cormdata) < 20000))"
"        ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
"              Index Cond: (to_tsvector('french'::regconfig, cormdata)
@@ to_tsquery('mauritanie'::text))"


With a join an another table detm, GIN index is not used


  explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000)  OR ( detmobjet ~* 'mauritanie' ))

"HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)"
"  ->  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)"
"        Hash Cond: (b.detmmailid = a.detmmailid)"
"        Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
(length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
"        ->  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
width=689)"
"        ->  Hash  (cost=19322.74..19322.74 rows=338574 width=94)"
"              ->  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
width=94)"


If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
index is used
  explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata) < 20000))

"HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)"
"  ->  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)"
"        ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
width=52)"
"              Recheck Cond: ((to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
AND (length(cormdata) < 20000))"
"              ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)"
"                    Index Cond: (to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text))"
"        ->  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
rows=1 width=52)"
"              Index Cond: (detmmailid = b.detmmailid)"

How can i force the use of the GIN index ?
thanks for your tips,

--
Jean-Max Reymond
CKR Solutions Open Source http://www.ckr-solutions.com


Re: not using my GIN index in JOIN expression

От
Heikki Linnakangas
Дата:
On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
> I am running the last version of PostgreSQL 9.3.3
> I have two tables detm and corm and a lot of datas in the column
> cormdata of corm table (1.4 GB).
>
> I have a GIN index on cormdata:
> CREATE INDEX ix_corm_fulltext_cormdata  ON corm
>     USING gin (to_tsvector('french'::regconfig, cormdata))
>     WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
>
> select distinct b.detmmailid from corm b where
> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
> b.cormishtml is false and length(b.cormdata) < 20000)
> is very fast and use the GIN index.
>
> "HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)"
> "  ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548 width=52)"
> "        Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
> (length(cormdata) < 20000))"
> "        ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
> (cost=0.00..24.11 rows=548 width=0)"
> "              Index Cond: (to_tsvector('french'::regconfig, cormdata)
> @@ to_tsquery('mauritanie'::text))"
>
>
> With a join an another table detm, GIN index is not used
>
>
>    explain select distinct a.detmmailid from detm a  JOIN corm b on
> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
> to_tsquery('mauritanie') and b.cormishtml is false and
> length(b.cormdata) < 20000)  OR ( detmobjet ~* 'mauritanie' ))
>
> "HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)"
> "  ->  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)"
> "        Hash Cond: (b.detmmailid = a.detmmailid)"
> "        Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
> "        ->  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
> width=689)"
> "        ->  Hash  (cost=19322.74..19322.74 rows=338574 width=94)"
> "              ->  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
> width=94)"
>
>
> If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
> index is used
>    explain select distinct a.detmmailid from detm a  JOIN corm b on
> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
> to_tsquery('mauritanie') and b.cormishtml is false and
> length(b.cormdata) < 20000))
>
> "HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)"
> "  ->  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)"
> "        ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
> width=52)"
> "              Recheck Cond: ((to_tsvector('french'::regconfig,
> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
> AND (length(cormdata) < 20000))"
> "              ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
> (cost=0.00..24.11 rows=548 width=0)"
> "                    Index Cond: (to_tsvector('french'::regconfig,
> cormdata) @@ to_tsquery('mauritanie'::text))"
> "        ->  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
> rows=1 width=52)"
> "              Index Cond: (detmmailid = b.detmmailid)"
>
> How can i force the use of the GIN index ?
> thanks for your tips,

The problem with the OR detmobject ~* 'mauritanie' restriction is that
the rows that match that condition cannot be found using the GIN index.
I think you'd want the system to fetch all the rows that match the other
condition using the GIN index, and do something else to find the other
rows. The planner should be able to do that if you rewrite the query as
a UNION:

select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata) < 20000)
union
select a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid
where detmobjet ~* 'mauritanie'

Note that that will not return rows in 'detm' that have no matching rows
in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
condition. That's what your original query also did, but if that's not
what you want, leave out the JOIN from the second part of the union.

- Heikki


Re: not using my GIN index in JOIN expression

От
Jean-Max Reymond
Дата:
Le 27/02/2014 15:19, Heikki Linnakangas a écrit :
> On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:
>> I am running the last version of PostgreSQL 9.3.3
>> I have two tables detm and corm and a lot of datas in the column
>> cormdata of corm table (1.4 GB).
>>
>> I have a GIN index on cormdata:
>> CREATE INDEX ix_corm_fulltext_cormdata  ON corm
>>     USING gin (to_tsvector('french'::regconfig, cormdata))
>>     WHERE cormishtml IS FALSE AND length(cormdata) < 20000;
>>
>> select distinct b.detmmailid from corm b where
>> (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
>> b.cormishtml is false and length(b.cormdata) < 20000)
>> is very fast and use the GIN index.
>>
>> "HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)"
>> "  ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
>> width=52)"
>> "        Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
>> (length(cormdata) < 20000))"
>> "        ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> "              Index Cond: (to_tsvector('french'::regconfig, cormdata)
>> @@ to_tsquery('mauritanie'::text))"
>>
>>
>> With a join an another table detm, GIN index is not used
>>
>>
>>    explain select distinct a.detmmailid from detm a  JOIN corm b on
>> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000)  OR ( detmobjet ~* 'mauritanie' ))
>>
>> "HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)"
>> "  ->  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)"
>> "        Hash Cond: (b.detmmailid = a.detmmailid)"
>> "        Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
>> to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
>> (length(b.cormdata) < 20000)) OR (a.detmobjet ~* 'mauritanie'::text))"
>> "        ->  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
>> width=689)"
>> "        ->  Hash  (cost=19322.74..19322.74 rows=338574 width=94)"
>> "              ->  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
>> width=94)"
>>
>>
>> If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
>> index is used
>>    explain select distinct a.detmmailid from detm a  JOIN corm b on
>> a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
>> to_tsquery('mauritanie') and b.cormishtml is false and
>> length(b.cormdata) < 20000))
>>
>> "HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)"
>> "  ->  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)"
>> "        ->  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
>> width=52)"
>> "              Recheck Cond: ((to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
>> AND (length(cormdata) < 20000))"
>> "              ->  Bitmap Index Scan on ix_corm_fulltext_cormdata
>> (cost=0.00..24.11 rows=548 width=0)"
>> "                    Index Cond: (to_tsvector('french'::regconfig,
>> cormdata) @@ to_tsquery('mauritanie'::text))"
>> "        ->  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
>> rows=1 width=52)"
>> "              Index Cond: (detmmailid = b.detmmailid)"
>>
>> How can i force the use of the GIN index ?
>> thanks for your tips,
>
> The problem with the OR detmobject ~* 'mauritanie' restriction is that
> the rows that match that condition cannot be found using the GIN index.
> I think you'd want the system to fetch all the rows that match the other
> condition using the GIN index, and do something else to find the other
> rows. The planner should be able to do that if you rewrite the query as
> a UNION:
>
> select a.detmmailid from detm a JOIN corm b on
> a.detmmailid = b.detmmailid
> where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
> b.cormishtml is false and length(b.cormdata) < 20000)
> union
> select a.detmmailid from detm a  JOIN corm b on
> a.detmmailid = b.detmmailid
> where detmobjet ~* 'mauritanie'
>
> Note that that will not return rows in 'detm' that have no matching rows
> in 'corm' table, even if they match the "detmobjet ~* 'mauritanie"
> condition. That's what your original query also did, but if that's not
> what you want, leave out the JOIN from the second part of the union.
>
> - Heikki

It works great: thanks a lot :-)

--
Jean-Max Reymond
CKR Solutions Open Source http://www.ckr-solutions.com