Hi,
I'm trying to figure out why a certain query doesn't use a partial index.
The context:
The table has about 100M rows with "contacts". Every contact belongs to an
"org", and can be marked "deleted". Since we're generally only interested in
non-deleted contacts we have a (partial) index on the table:
"contact_organization_id" btree (org_id, id) WHERE deleted IS NULL
for queries such as: "select * from contacts where org_id=123 and deleted is null order by id"
Works well enough. However, we now have an org_id which has > 10% of the rows,
but only a handful rows where "deleted is null" matches (so the org has a lot
of "deleted" contacts). The planner doesn't like this and it falls back to a
full table scan for the above query.
I've added a dedicated index just for that org_id, to see if that helps:
"org123" btree (id) WHERE deleted IS NULL AND org_id = 123
The planner seems to use it now, however the row estimate is way off:
my_db=> explain SELECT c.id, COALESCE(c.first_name, '') FROM contacts AS c WHERE c.organization_id = 123 AND
c.deletedIS NULL ORDER BY id ASC;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=1761204.72..1883324.98 rows=1046672 width=36)
Workers Planned: 2
-> Sort (cost=1760204.70..1761513.04 rows=523336 width=36)
Sort Key: id
-> Parallel Bitmap Heap Scan on contact c (cost=318.13..1696183.14 rows=523336 width=36)
Recheck Cond: ((deleted IS NULL) AND (organization_id = 8448))
-> Bitmap Index Scan on org123 (cost=0.00..4.13 rows=1256006 width=0)
(7 rows)
Time: 3.337 ms
That "1256006" estimate from the org123 index is wrong. There are 7 contacts in
that org:
-> Bitmap Index Scan on org8448 (cost=0.00..4.13 rows=1256006 width=0) (actual time=0.841..0.841
rows=7loops=1)
Is this expected? Is there a way to improve this?
Thanks!
Harmen