[PERFORM] anti-join with small table via text/varchar cannot estimate rowscorrectly
От | Stefan Andreatta |
---|---|
Тема | [PERFORM] anti-join with small table via text/varchar cannot estimate rowscorrectly |
Дата | |
Msg-id | 4de52c93-68e1-7c4b-86ff-4dc17ae400bb@synedra.com обсуждение исходный текст |
Ответы |
Re: [PERFORM] anti-join with small table via text/varchar cannotestimate rows correctly
("David G. Johnston" <david.g.johnston@gmail.com>)
Re: [PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
Hello,
I have encountered a strange problem when doing an anti-join with a very small table via a varchar or text field as opposed to an integer field. Postgres version is 9.5.3
I did some experiments to extract the problem in a simple form. FIrst generate two tables with a series of numbers - once as integers once as text. The first table has 10,000 rows the second table just one:
Thanks for any help,
Stefan
I have encountered a strange problem when doing an anti-join with a very small table via a varchar or text field as opposed to an integer field. Postgres version is 9.5.3
I did some experiments to extract the problem in a simple form. FIrst generate two tables with a series of numbers - once as integers once as text. The first table has 10,000 rows the second table just one:
=# select generate_series(1, 10000) as id, generate_series(1,10000)::text as text into table tmp_san_1; SELECT 10000 =# select generate_series(1, 1) as id, generate_series(1,1)::text as text into table tmp_san_2; SELECT 1 =# analyze tmp_san_1; ANALYZE =# analyze tmp_san_2; ANALYZE =# \d tmp_san_* Table "public.tmp_san_1"Column | Type | Modifiers --------+---------+-----------id | integer | text | text | Table "public.tmp_san_2"Column | Type | Modifiers --------+---------+-----------id | integer | text | text |Now I do an anti-join between the two tables via the id field (integer). The number of resulting rows are estimated correctly as 9,999:
=# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.id = tmp_san_2.id where tmp_san_2.id is null; QUERY PLAN -------------------------------------------------------------------------------------------------------------------Hash Anti Join (cost=1.02..281.26 rows=9999 width=4) (actual time=0.019..2.743 rows=9999 loops=1) Hash Cond: (tmp_san_1.id = tmp_san_2.id) -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=10000 width=4) (actual time=0.007..1.023 rows=10000 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)Planning time: 0.138 msExecution time: 3.218 ms (8 rows)The same anti-join using the text fields, however estimates just 1 resulting row, while there are still of course 9,999 of them:
=# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text where tmp_san_2.id is null; QUERY PLAN -------------------------------------------------------------------------------------------------------------------Hash Left Join (cost=1.02..192.53 rows=1 width=4) (actual time=0.020..3.091 rows=9999 loops=1) Hash Cond: (tmp_san_1.text = tmp_san_2.text) Filter: (tmp_san_2.id IS NULL) Rows Removed by Filter: 1 -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=10000 width=8) (actual time=0.008..0.983 rows=10000 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=6) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=6) (actual time=0.002..0.002 rows=1 loops=1)Planning time: 0.173 msExecution time: 3.546 ms (10 rows)I cannot explain that behavior and much less think of a fix or workaround. Unfortunately my real-world example has to use varchar for the join.
Thanks for any help,
Stefan
В списке pgsql-performance по дате отправления:
Следующее
От: "David G. Johnston"Дата:
Сообщение: Re: [PERFORM] anti-join with small table via text/varchar cannotestimate rows correctly