Обсуждение: plan using BTree VS GIN

Поиск
Список
Период
Сортировка

plan using BTree VS GIN

От
Nicolas Seinlet
Дата:
Hello, we faced an issue with a select query on a relatively large table on our database. The query involves one single table. The table has more than 10 million records. It's mainly composed of varchar fields, have a primary key (id) of type serial, and when records of this table are shown to users, they are sorted users 2 fields, display_name (varchar) and id (the primary key). Because this table is heavily used in various contexts in our application, we have multiple indexes on it. Among other index, we have gin index on some fields of the table. Among other things, we're using unaccent. We are aware the unaccent function is mutable, but we have an immutable version of unaccent. the table is similar to (I can give you all the fields of the table if needed):                                                     Table "public.res_partner"               Column               |            Type             | Collation | Nullable |                 Default                 -----------------------------------+-----------------------------+-----------+----------+-----------------------------------------  id                                | integer                     |           | not null | nextval('res_partner_id_seq'::regclass)  active                            | boolean                     |           |          |  name                              | character varying           |           |          |  display_name                      | character varying           |           |          |  ref                               | character varying           |           |          |  email                             | character varying           |           |          |  vat                               | character varying           |           |          |  type                              | character varying           |           |          |  company_registry                  | character varying           |           |          | Gin Index: "res_partner_unaccent_tgm_ref" gin (unaccent(ref::text) gin_trgm_ops) WHERE ref IS NOT NULL "res_partner_unaccent_tgm_vat" gin (unaccent(vat::text) gin_trgm_ops) WHERE vat IS NOT NULL "res_partner_unaccent_tgm_idx_gin2" gin (unaccent(name::text) gin_trgm_ops, unaccent(display_name::text) gin_trgm_ops, unaccent(ref::text) gin_trgm_ops, unaccent(email::text) gin_trgm_ops, unaccent(vat::text) gin_trgm_ops) "res_partner_name_tgm_idx_gin" gin (name gin_trgm_ops, display_name gin_trgm_ops, ref gin_trgm_ops, email gin_trgm_ops, vat gin_trgm_ops) "res_partner_unaccent_tgm_display_namee" gin (unaccent(display_name::text) gin_trgm_ops) "res_partner_unaccent_tgm_email" gin (unaccent(email::text) gin_trgm_ops) WHERE email IS NOT NULL "res_partner_comp_reg_idx3" gin (unaccent(company_registry::text) gin_trgm_ops) WHERE company_registry IS NOT NULL BTree index: "res_partner_displayname_id_idx" btree (display_name, id) WHERE active "res_partner_comp_reg_idx2" btree (unaccent(company_registry::text)) WHERE company_registry IS NOT NULL The btree index res_partner_displayname_id_idx have been added lately and perfectly match a criteria (where active) and sorting (display_name, id) we have in quite all our queries on this table. The query that cause the issue is this one: SELECT "res_partner"."id"   FROM "res_partner"  WHERE (("res_partner"."active" = true) AND          (          (            (              ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))            OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))         OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))      OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))    OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))  AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL ) ORDER BY "res_partner"."display_name" ,"res_partner"."id"      LIMIT 100 We have the common criteria  (active=true), the common sorting, a limit, and a search on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.  Here is the query plan:  Limit  (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)    Output: id, display_name    Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247    ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 rows=1 loops=1)          Output: id, display_name          Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~ * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)))          Rows Removed by Filter: 6226870          Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247  Planning Time: 0.891 ms  Execution Time: 86509.070 ms (10 rows) It's not using our gin index at all, but the btree one. If we change some things, like:    remove criteria on company_registry field or limit=10k or longer string than 3 characters, like '%full name of partner%' or sometimes, but not every time and for a limited period of time, analyze the table or set enable_indexscan=FALSE the plan switch to:  Limit  (cost=3061.03..3061.28 rows=100 width=25) (actual time=496.092..496.095 rows=1 loops=1)    Output: id, display_name    Buffers: shared hit=4527 read=2791    ->  Sort  (cost=3061.03..3065.70 rows=1868 width=25) (actual time=496.091..496.094 rows=1 loops=1)          Output: id, display_name          Sort Key: res_partner.display_name, res_partner.id          Sort Method: quicksort  Memory: 25kB          Buffers: shared hit=4527 read=2791          ->  Bitmap Heap Scan on public.res_partner  (cost=958.14..2989.64 rows=1868 width=25) (actual time=496.050..496.053 rows=1 loops=1)                Output: id, display_name                Recheck Cond: ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text))                Filter: (res_partner.active AND (((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)))                Heap Blocks: exact=1                Buffers: shared hit=4521 read=2791                ->  BitmapOr  (cost=958.14..958.14 rows=1960 width=0) (actual time=496.020..496.022 rows=0 loops=1)                      Buffers: shared hit=4520 read=2791                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.83 rows=645 width=0) (actual time=228.725..228.725 rows=1 loops=1)                            Index Cond: (unaccent((res_partner.display_name)::text) ~~* '%nse%'::text)                            Buffers: shared hit=2169 read=1374                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.81 rows=642 width=0) (actual time=256.083..256.083 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.email)::text) ~~* '%nse%'::text)                            Buffers: shared hit=1906 read=1348                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.00 rows=1 width=0) (actual time=2.693..2.693 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.ref)::text) ~~* '%nse%'::text)                            Buffers: shared hit=178 read=7                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.14 rows=19 width=0) (actual time=8.414..8.415 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.vat)::text) ~~* '%nse%'::text)                            Buffers: shared hit=225 read=62                      ->  Bitmap Index Scan on res_partner_comp_reg_idx3  (cost=0.00..42.01 rows=654 width=0) (actual time=0.099..0.099 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)                            Buffers: shared hit=42  Planning Time: 20.921 ms  Execution Time: 496.296 ms (33 rows) How can we have PostgreSQL choose the second query plan? If we look at the expected VS actual rows, statistics look quite accurate. Thanks for reading, Nicolas.
Вложения

Re: plan using BTree VS GIN

От
Laurenz Albe
Дата:
On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
> we faced an issue with a select query on a relatively large table on our database.
> The query involves one single table. The table has more than 10 million records.
> It's mainly composed of varchar fields, have a primary key (id) of type serial,
> and when records of this table are shown to users, they are sorted users 2 fields,
> display_name (varchar) and id (the primary key). Because this table is heavily used
> in various contexts in our application, we have multiple indexes on it. Among other
> index, we have gin index on some fields of the table.
>
> The btree index res_partner_displayname_id_idx have been added lately and perfectly
> match a criteria (where active) and sorting (display_name, id) we have in quite all
> our queries on this table.
>
> The query that cause the issue is this one:
> SELECT "res_partner"."id"
>   FROM "res_partner"
>  WHERE (("res_partner"."active" = true) AND
>          (
>          (
>            (
>              ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
>            OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
>         OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
>      OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
>    OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))
>
>  AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )
>
> ORDER BY "res_partner"."display_name" ,"res_partner"."id"  
>    LIMIT 100
>
> We have the common criteria  (active=true), the common sorting, a limit, and a search
> on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.
>
>  Here is the query plan:
>  Limit  (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
>    Output: id, display_name
>    Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>    ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  (cost=0.56..1200212.37 rows=11214
width=25)(actual time=56383.793..86509.022 rows=1 loops=1) 
>          Output: id, display_name
>          Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type
ISNULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR 
> (unaccent((res_partner.email)::text) ~~
> * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~*
'%nse%'::text)OR (unaccent((res_partner.company_registry)::text) ~~* 
> '%nse%'::text)))
>          Rows Removed by Filter: 6226870
>          Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>  Planning Time: 0.891 ms
>  Execution Time: 86509.070 ms
> (10 rows)
>
> It's not using our gin index at all, but the btree one.

The problem is that PostgreSQL estimates that the index scan will return 11214
rows, when it is actually one.  This makes the plan to scan the table using
an index that matches the ORDER BY clause appealing: we might find 100 rows
quickly and avoid a sort.

You can try to improve the estimates with more detailed statistics,
but if that doesn't do the job, you can modify the ORDER BY clause so
that it cannot use the bad index:

  ORDER BY res_partner.display_name ,res_partner.id + 0

Yours,
Laurenz Albe



Re: plan using BTree VS GIN

От
Nicolas Seinlet
Дата:
On Friday, June 23rd, 2023 at 2:52 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:


>

>

> On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
>

> > we faced an issue with a select query on a relatively large table on our database.
> > The query involves one single table. The table has more than 10 million records.
> > It's mainly composed of varchar fields, have a primary key (id) of type serial,
> > and when records of this table are shown to users, they are sorted users 2 fields,
> > display_name (varchar) and id (the primary key). Because this table is heavily used
> > in various contexts in our application, we have multiple indexes on it. Among other
> > index, we have gin index on some fields of the table.
> >

> > The btree index res_partner_displayname_id_idx have been added lately and perfectly
> > match a criteria (where active) and sorting (display_name, id) we have in quite all
> > our queries on this table.
> >

> > The query that cause the issue is this one:
> > SELECT "res_partner"."id"
> > FROM "res_partner"
> > WHERE (("res_partner"."active" = true) AND
> > (
> > (
> > (
> > ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
> > OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
> > OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))
> >

> > AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )
> >

> > ORDER BY "res_partner"."display_name" ,"res_partner"."id"
> > LIMIT 100
> >

> > We have the common criteria (active=true), the common sorting, a limit, and a search
> > on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.
> >

> > Here is the query plan:
> > Limit (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
> > Output: id, display_name
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > -> Index Scan using res_partner_displayname_id_idx on public.res_partner (cost=0.56..1200212.37 rows=11214
width=25)(actual time=56383.793..86509.022 rows=1 loops=1) 
> > Output: id, display_name
> > Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS
NULL))AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR 
> > (unaccent((res_partner.email)::text) ~~
> > * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~*
'%nse%'::text)OR (unaccent((res_partner.company_registry)::text) ~~* 
> > '%nse%'::text)))
> > Rows Removed by Filter: 6226870
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > Planning Time: 0.891 ms
> > Execution Time: 86509.070 ms
> > (10 rows)
> >

> > It's not using our gin index at all, but the btree one.
>

>

> The problem is that PostgreSQL estimates that the index scan will return 11214
> rows, when it is actually one. This makes the plan to scan the table using
> an index that matches the ORDER BY clause appealing: we might find 100 rows
> quickly and avoid a sort.
>

> You can try to improve the estimates with more detailed statistics,
> but if that doesn't do the job, you can modify the ORDER BY clause so
> that it cannot use the bad index:
>

> ORDER BY res_partner.display_name ,res_partner.id + 0
>

> Yours,
> Laurenz Albe

Hello,

First of all, thanks, this solves the issue for the given query.

Some more questions then,
> we might find 100 rows quickly
The cost estimate for 11214 rows is 1200212.37
If I look at the other plan, none of the estimated cost reach such levels (~2k for indexes + 1k for the BitmapOr, 3k
forBitmap Heap Scan, and finally 1k for sort and limit), roughly 7k 

And that's part of what I didn't understand. How is the first cost estimated? If we divide by 110 the cost to go from
11krecords to 100, it's still ~10k, more than the other plan. 

Thanks again,

Nicolas.


Вложения