Обсуждение: bpchar, text and indexes

Поиск
Список
Период
Сортировка

bpchar, text and indexes

От
Victor Yegorov
Дата:
Greetings.

I'd like to understand why Postgres behaves the way it does.
I was not able to find relevant mail thread myself, if one exists — please, point at it.

Test setup:

PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit

create table t(t_id int4, sn_c char(20));
insert into t select id, chr((random()*26)::int4+65)||chr((random()*26)::int4+65)||((random()*99999)::int4+1) from generate_series(1, 10000) id;
create index i_t_sn_c on t(sn_c);
vacuum analyze t;

Now, if I do a typical query, all is good:

postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234';
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Only Scan using i_t_sn_c on t (actual time=0.015..0.015 rows=0 loops=1)
   Index Cond: (sn_c = 'AB1234'::bpchar)
   Heap Fetches: 0


If I explicitly cast constant to `text`, then Postgres will add `(sn_c)::text` cast, which disables index:

postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::text;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on t (actual time=5.729..5.729 rows=0 loops=1)
   Filter: ((sn_c)::text = 'AB1234'::text)
   Rows Removed by Filter: 10000


Although, if I will use LIKE instead of equality, then index is used:

postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c ~~ 'AB1234'::text;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Only Scan using i_t_sn_c on t (actual time=0.012..0.012 rows=0 loops=1)
   Index Cond: (sn_c = 'AB1234'::bpchar)
   Filter: (sn_c ~~ 'AB1234'::text)
   Heap Fetches: 0


And what I also see is — `varchar` has no such effect:

postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::varchar;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Only Scan using i_t_sn_c on t (actual time=0.041..0.041 rows=0 loops=1)
   Index Cond: (sn_c = 'AB1234'::bpchar)
   Heap Fetches: 0



My questions are:

1. according to `pg_cast`, `text` => `bpchar` is binary coercible. Why Postgres is casting `sn_c` to `text` here, disabling index usage?

2. as I can see in `pg_cast`, setup for `varchar` is pretty much the same: `varchar` => `bpchar` is also binary coercible. So why for `varchar` behaviour is different?


Thanks in advance.


--
Victor Y. Yegorov

Re: bpchar, text and indexes

От
Thomas Kellerer
Дата:
Victor Yegorov schrieb am 22.02.2016 um 16:45:
> Test setup:
>
> PostgreSQL 9.4.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
>
> create table t(t_id int4, sn_c char(20));
> insert into t select id, chr((random()*26)::int4+65)||chr((random()*26)::int4+65)||((random()*99999)::int4+1) from
generate_series(1,10000) id; 
> create index i_t_sn_c on t(sn_c);
> vacuum analyze t;
>
> Now, if I do a typical query, all is good:
>
> postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234';
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Index Only Scan using i_t_sn_c on t (actual time=0.015..0.015 rows=0 loops=1)
>    Index Cond: (sn_c = 'AB1234'::bpchar)
>    Heap Fetches: 0
>
>
> If I explicitly cast constant to `text`, then Postgres will add `(sn_c)::text` cast, which disables index:
>
> postgres=# EXPLAIN (analyze, costs off) SELECT sn_c FROM t WHERE sn_c = 'AB1234'::text;
>                        QUERY PLAN
> ---------------------------------------------------------
>  Seq Scan on t (actual time=5.729..5.729 rows=0 loops=1)
>    Filter: ((sn_c)::text = 'AB1234'::text)
>    Rows Removed by Filter: 10000
>

I assume that this has to do with the fact that char(n) is blank padded to 20 character.

To be able to correctly compare that to a text value, sn_c has to be casted to text and then the index (which contains
blankpadded values) can not be used any more.  

Another very good example why the dreaded char() should not be used ;)

If you use varchar(20) instead of char(20) both queries yield the same execution plan (at least on my local 9.5)

Thomas


Re: bpchar, text and indexes

От
Victor Yegorov
Дата:
2016-02-22 18:00 GMT+02:00 Thomas Kellerer <spam_eater@gmx.net>:
I assume that this has to do with the fact that char(n) is blank padded to 20 character.

Yes, this was my thought too, but I do not understand what is going on in details.


To be able to correctly compare that to a text value, sn_c has to be casted to text and then the index (which contains blank padded values) can not be used any more.

Well, for `varchar` type Postgres is able to do `varchar` -> `bpchar` cast for my constant. I do not understand why for `text` it cannot and casts column instead.


--
Victor Y. Yegorov

Re: bpchar, text and indexes

От
Tom Lane
Дата:
Victor Yegorov <vyegorov@gmail.com> writes:
> Well, for `varchar` type Postgres is able to do `varchar` -> `bpchar` cast
> for my constant. I do not understand why for `text` it cannot and casts
> column instead.

In cross-type comparisons like these, the parser basically has a choice
between whether to apply texteq or bpchareq.  It's going to have to cast
one side or the other either way.  It prefers to cast to text, not away
from text, because text is the preferred type in the string category.
So "bpchar = text" is resolved as texteq.

In the "bpchar = varchar" case, that doesn't happen because neither side
is text to start with, so bpchareq is chosen on the grounds of requiring
fewer casts.

There's no varchareq operator (because varchar just relies on text's
operators).  If there were, you'd probably get an "ambiguous operator,
please cast" error from "bpchar = varchar", because neither of those are
preferred types so there would be no way to prefer one interpretation
over the other.  Similarly, if text weren't treated as a preferred type
then "bpchar = text" would just fail, it would still not be resolved
the way you want.

See
http://www.postgresql.org/docs/9.5/static/typeconv-oper.html
for a more detailed explanation.

            regards, tom lane