Обсуждение: CHAR vs NVARCHAR vs TEXT performance

Поиск
Список
Период
Сортировка

CHAR vs NVARCHAR vs TEXT performance

От
Rob
Дата:
Hi Guys,

I wanted to get some thoughts about a type-specific performance problem 
we hit
through our application tier.

The full conversation is here: 
https://github.com/npgsql/npgsql/issues/2283

Basically, if a table exists with a PK which is CHAR(n) and a query is 
sent with
VARCHAR or CHAR then it uses an Index Scan. If the query is sent with 
TEXT as the
type then postgresql casts the column to TEXT (rather than the value to 
CHAR) and
it does a Seq Scan.

So far this has only showed itself on npgsql (I've been unable to 
reproduce on
other drivers), I think it's because npgsql only sends TEXT whereas 
other drivers
tend to send VARCHAR (other drivers includes the official JDBC driver).

I guess the root question is: is TEXT supposed to be identical to 
VARCHAR in all scenarios?

Thanks,
Rob



Re: CHAR vs NVARCHAR vs TEXT performance

От
Tom Lane
Дата:
Rob <postgresql@mintsoft.net> writes:
> Basically, if a table exists with a PK which is CHAR(n) and a query is
> sent with VARCHAR or CHAR then it uses an Index Scan. If the query is
> sent with TEXT as the type then postgresql casts the column to TEXT
> (rather than the value to CHAR) and it does a Seq Scan.

Yeah, this is an artifact of the fact that text is considered a
"preferred type" so it wins out in the parser's choice of which
type to promote to.  See

https://www.postgresql.org/docs/current/typeconv-oper.html

> I guess the root question is: is TEXT supposed to be identical to 
> VARCHAR in all scenarios?

It's not for this purpose, because varchar isn't a preferred type.

FWIW, my recommendation for this sort of thing is almost always
to not use CHAR(n).  The use-case for that datatype pretty much
disappeared with the last IBM Model 029 card punch.

            regards, tom lane



Re: CHAR vs NVARCHAR vs TEXT performance

От
Rob
Дата:
I agree in principle, however in this particular scenario it's not
our schema so we're a little reluctant to migrate the types etc.

We're in a bit of a bad place because the combination of NHibernate
+ npgsql3/4 + this table = seqScans everywhere. Basically when npgsql
changed their default type for strings from VARCHAR to TEXT it caused
this behaviour.

I suppose the follow up question is: should drivers
default to sending types that are preferred by postgres (i.e. TEXT)
rather than compatible types (VARCHAR). If so, is there a reason why
the JDBC driver doesn't send TEXT  (possibly a question for the JDBC
guys rather than here)?

Thanks,
Rob

On 2019-04-30 00:16, Thomas Munro wrote:
> On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, my recommendation for this sort of thing is almost always
>> to not use CHAR(n).  The use-case for that datatype pretty much
>> disappeared with the last IBM Model 029 card punch.
> 
> +1 on the recommendation for PostgreSQL.
> 
> I do think it's useful on slightly more recent IBM technology than the
> 029 though.  It's been a few years since I touched it, but DB2 manuals
> and experts in this decade recommended fixed size types in some
> circumstances, and they might in theory be useful on any
> in-place-update system (and maybe us in some future table AM?).  For
> example, you can completely exclude the possibility of having to spill
> to another page when updating (DB2 DBAs measure and complain about
> rate of 'overflow' page usage which they consider failure and we
> consider SOP), you can avoid wasting space on the length (at the cost
> of wasting space on trailing spaces, if the contents vary in length),
> you can get O(1) access to fixed sized attributes (perhaps even
> updating single attributes).  These aren't nothing, and I've seen DB2
> DBAs get TPS improvements from that kind of stuff.  (From memory this
> type of thing was also a reason to think carefully about which tables
> should use compression, because the fixed size space guarantees went
> out the window.).




Re: CHAR vs NVARCHAR vs TEXT performance

От
Steve Crawford
Дата:

> On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, my recommendation for this sort of thing is almost always
>> to not use CHAR(n).  The use-case for that datatype pretty much
>> disappeared with the last IBM Model 029 card punch.
...



Perhaps the "tip" on the character datatype page (https://www.postgresql.org/docs/11/datatype-character.html) should be updated as the statement "There is no performance difference among these three types..." could easily lead a reader down the wrong path. The statement may be true if one assumes the planner is able to make an optimal choice but clearly there are cases that prevent that. If the situation is better explained elsewhere in the documentation then just a link to that explanation may be all that is needed.

Cheers,
Steve