row estimate for partial index

Поиск
Список
Период
Сортировка
От Harmen
Тема row estimate for partial index
Дата
Msg-id 20230113150159.GJ91568@arp.lijzij.de
обсуждение исходный текст
Ответы Re: row estimate for partial index  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hi,

I'm trying to figure out why a certain query doesn't use a partial index.

The context:
The table has about 100M rows with "contacts". Every contact belongs to an
"org", and can be marked "deleted". Since we're generally only interested in
non-deleted contacts we have a (partial) index on the table:

    "contact_organization_id" btree (org_id, id) WHERE deleted IS NULL

for queries such as: "select * from contacts where org_id=123 and deleted is null order by id"

Works well enough. However, we now have an org_id which has > 10% of the rows,
but only a handful rows where "deleted is null" matches (so the org has a lot
of "deleted" contacts). The planner doesn't like this and it falls back to a
full table scan for the above query.



I've added a dedicated index just for that org_id, to see if that helps:

    "org123" btree (id) WHERE deleted IS NULL AND org_id = 123

The planner seems to use it now, however the row estimate is way off:

    my_db=> explain SELECT c.id, COALESCE(c.first_name, '') FROM contacts AS c WHERE c.organization_id = 123 AND
c.deletedIS NULL ORDER BY id ASC;
 
                                                 QUERY PLAN                                             
    ────────────────────────────────────────────────────────────────────────────────────────────────────
     Gather Merge  (cost=1761204.72..1883324.98 rows=1046672 width=36)
       Workers Planned: 2
       ->  Sort  (cost=1760204.70..1761513.04 rows=523336 width=36)
             Sort Key: id
             ->  Parallel Bitmap Heap Scan on contact c  (cost=318.13..1696183.14 rows=523336 width=36)
                   Recheck Cond: ((deleted IS NULL) AND (organization_id = 8448))
                   ->  Bitmap Index Scan on org123  (cost=0.00..4.13 rows=1256006 width=0)
    (7 rows)

    Time: 3.337 ms

That "1256006" estimate from the org123 index is wrong. There are 7 contacts in
that org:
               ->  Bitmap Index Scan on org8448  (cost=0.00..4.13 rows=1256006 width=0) (actual time=0.841..0.841
rows=7loops=1)
 


Is this expected? Is there a way to improve this?

Thanks!
Harmen



В списке pgsql-general по дате отправления:

Предыдущее
От: Dimitrios Apostolou
Дата:
Сообщение: Re: Why is a hash join preferred when it does not fit in work_mem
Следующее
От: Ron
Дата:
Сообщение: Directly embedding a psql SET variable inside another string?