Table with many NULLS for indexed column yields strange query plan

Поиск
Список
Период
Сортировка
От greigwise
Тема Table with many NULLS for indexed column yields strange query plan
Дата
Msg-id 1583456933425-0.post@n3.nabble.com
обсуждение исходный текст
Ответы Re: Table with many NULLS for indexed column yields strange queryplan
Список pgsql-general
I have a query like this:

SELECT  "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') ;

table2 has a large number of NULLS in the column table3_id.  There is an
index on this column.  Here is the result of explain analyze:

Merge Join  (cost=1001.20..4076.67 rows=17278 width=167) (actual
time=284.918..300.167 rows=2244 loops=1)
   Merge Cond: (table2.table3_id = table3.id)
   ->  Gather Merge  (cost=1000.93..787825.78 rows=621995 width=175) (actual
time=5.786..283.269 rows=64397 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Nested Loop  (cost=0.87..712740.12 rows=155499 width=175)
(actual time=0.091..102.708 rows=13107 loops=5)
               ->  Parallel Index Scan using index_table2_on_table3_id on
table2  (cost=0.43..489653.08 rows=155499 width=16) (actual
time=0.027..22.327 rows=13107 loops=5)
                     Filter: ((type)::text = 'Standard'::text)
               ->  Index Scan using table1_pk on table1  (cost=0.44..1.43
rows=1 width=167) (actual time=0.005..0.005 rows=1 loops=65535)
                     Index Cond: (id = table2.table1_id)
   ->  Index Scan using table3_pkey on table3  (cost=0.27..53.40 rows=1
width=8) (actual time=0.041..0.048 rows=1 loops=1)
         Filter: ((number)::text = 'xxxx'::text)
         Rows Removed by Filter: 35
 Planning time: 0.450 ms
 Execution time: 310.230 ms

You can see the row estimate there is way off on the Parallel Index Scan.
I suspect that this is because it's including the rows with null in the
selectivity estimate even though the table3_id can't possibly be null here
due to the inner join.

If I modify the query like this:

SELECT  "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') and table3_id is not null;

Just adding in table3_id is not null at the end there, I get a much better
plan.

Nested Loop  (cost=1.14..290.04 rows=66 width=167) (actual
time=0.058..11.258 rows=2244 loops=1)
   ->  Nested Loop  (cost=0.70..64.46 rows=66 width=8) (actual
time=0.049..2.873 rows=2244 loops=1)
         ->  Index Scan using table3_pkey on table3  (cost=0.27..53.40
rows=1 width=8) (actual time=0.030..0.035 rows=1 loops=1)
               Filter: ((number)::text = 'xxxx'::text)
               Rows Removed by Filter: 35
         ->  Index Scan using index_table2_on_table3_id on table2
(cost=0.43..11.05 rows=1 width=16) (actual time=0.017..2.102 rows=2244
loops=1)
               Index Cond: ((table3_id = table3.id) AND (table3_id IS NOT
NULL))
               Filter: ((type)::text = 'Standard'::text)
   ->  Index Scan using table1_pk on table1  (cost=0.44..3.42 rows=1
width=167) (actual time=0.003..0.003 rows=1 loops=2244)
         Index Cond: (id = table2. id)
 Planning time: 0.403 ms
 Execution time: 11.672 ms

Can I do anything statistics wise so that I get a better plan here or do I
have to modify the query.  It seems kinda hacky that I would have to specify
is not null on that column since like I said it can't possibly be null.

Thanks,

Greig Wise



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



В списке pgsql-general по дате отправления:

Предыдущее
От: Rory Campbell-Lange
Дата:
Сообщение: Re: Advice request : simultaneous function/data updates on manydatabases
Следующее
От: Mark Haylock
Дата:
Сообщение: What do null column values for pg_stat_progress_vacuum mean?