Обсуждение: character type value is not padded with spaces
Character type value including multibyte characters is not padded
with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
create table t (a char(10));
insert into t values ('XXXXX'); -- X is 2byte character.
I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
select a, octed_length(a) from t;
a | octet_length
-------+--------------
XXXXX | 10
If padded with spaces, octet_length(a) is 15. This problem is caused
that string length is calculated by byte length(VARSIZE) in
exprTypmod().
I attache the patch for this problem.
Regards,
--
Yoshiyuki Asaba
y-asaba@sra.co.jp
*** parse_expr.c.orig 2005-01-13 02:32:36.000000000 +0900
--- parse_expr.c 2005-05-22 17:12:37.000000000 +0900
***************
*** 18,23 ****
--- 18,24 ----
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
#include "commands/dbcommands.h"
+ #include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/params.h"
***************
*** 34,40 ****
#include "utils/lsyscache.h"
#include "utils/syscache.h"
-
bool Transform_null_equals = false;
static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref);
--- 35,40 ----
***************
*** 1491,1497 ****
{
case BPCHAROID:
if (!con->constisnull)
! return VARSIZE(DatumGetPointer(con->constvalue));
break;
default:
break;
--- 1491,1503 ----
{
case BPCHAROID:
if (!con->constisnull)
! {
! int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ;
!
! if (pg_database_encoding_max_length() > 1)
! len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len);
! return len + VARHDRSZ;
! }
break;
default:
break;
Hackers,
The problem he found is not only existing in Japanese characters but
also in any multibyte encodings including UTF-8. For me the patch
looks good and I will commit it to 7.3, 7.4, 8.0 stables and current
if there's no objection.
--
Tatsuo Ishii
> Character type value including multibyte characters is not padded
> with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
>
> create table t (a char(10));
> insert into t values ('XXXXX'); -- X is 2byte character.
>
> I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
>
> select a, octed_length(a) from t;
>
> a | octet_length
> -------+--------------
> XXXXX | 10
>
> If padded with spaces, octet_length(a) is 15. This problem is caused
> that string length is calculated by byte length(VARSIZE) in
> exprTypmod().
>
> I attache the patch for this problem.
>
> Regards,
>
> --
> Yoshiyuki Asaba
> y-asaba@sra.co.jp
Ahemm,...
UNICODE DB:
create table t (a char(10));
set client_encoding = iso88591;
insert into t VALUES ('æøå');
select a, octet_length(a),length(a) from t;
a | octet_length | length
------------+--------------+--------
æøå | 13 | 3
(1 row)
This is with 8.0.2.
Just FYI.
... John
> -----Original Message-----
> From: pgsql-patches-owner@postgresql.org
> [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii
> Sent: Tuesday, May 24, 2005 8:52 AM
> To: y-asaba@sra.co.jp
> Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [PATCHES] character type value is not padded with spaces
>
> Hackers,
>
> The problem he found is not only existing in Japanese
> characters but also in any multibyte encodings including
> UTF-8. For me the patch looks good and I will commit it to
> 7.3, 7.4, 8.0 stables and current if there's no objection.
> --
> Tatsuo Ishii
>
> > Character type value including multibyte characters is not
> padded with
> > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> >
> > create table t (a char(10));
> > insert into t values ('XXXXX'); -- X is 2byte character.
> >
> > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
> >
> > select a, octed_length(a) from t;
> >
> > a | octet_length
> > -------+--------------
> > XXXXX | 10
> >
> > If padded with spaces, octet_length(a) is 15. This problem
> is caused
> > that string length is calculated by byte length(VARSIZE) in
> > exprTypmod().
> >
> > I attache the patch for this problem.
> >
> > Regards,
> >
> > --
> > Yoshiyuki Asaba
> > y-asaba@sra.co.jp
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>
>
I think you need to test with 5 characters, not 3.
--
Tatsuo Ishii
> Ahemm,...
>
> UNICODE DB:
>
> create table t (a char(10));
> set client_encoding = iso88591;
> insert into t VALUES ('æøå');
>
> select a, octet_length(a),length(a) from t;
> a | octet_length | length
> ------------+--------------+--------
> æøå | 13 | 3
> (1 row)
>
> This is with 8.0.2.
>
> Just FYI.
>
> ... John
>
> > -----Original Message-----
> > From: pgsql-patches-owner@postgresql.org
> > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii
> > Sent: Tuesday, May 24, 2005 8:52 AM
> > To: y-asaba@sra.co.jp
> > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> > Subject: Re: [PATCHES] character type value is not padded with spaces
> >
> > Hackers,
> >
> > The problem he found is not only existing in Japanese
> > characters but also in any multibyte encodings including
> > UTF-8. For me the patch looks good and I will commit it to
> > 7.3, 7.4, 8.0 stables and current if there's no objection.
> > --
> > Tatsuo Ishii
> >
> > > Character type value including multibyte characters is not
> > padded with
> > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> > >
> > > create table t (a char(10));
> > > insert into t values ('XXXXX'); -- X is 2byte character.
> > >
> > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
> > >
> > > select a, octed_length(a) from t;
> > >
> > > a | octet_length
> > > -------+--------------
> > > XXXXX | 10
> > >
> > > If padded with spaces, octet_length(a) is 15. This problem
> > is caused
> > > that string length is calculated by byte length(VARSIZE) in
> > > exprTypmod().
> > >
> > > I attache the patch for this problem.
> > >
> > > Regards,
> > >
> > > --
> > > Yoshiyuki Asaba
> > > y-asaba@sra.co.jp
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to
> > majordomo@postgresql.org)
> >
> >
>
Ahhh...
> -----Original Message-----
> From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
> Sent: Tuesday, May 24, 2005 9:26 AM
> To: John Hansen
> Cc: y-asaba@sra.co.jp; pgsql-patches@postgresql.org;
> pgsql-hackers@postgresql.org
> Subject: Re: [PATCHES] character type value is not padded with spaces
>
> I think you need to test with 5 characters, not 3.
> --
> Tatsuo Ishii
>
> > Ahemm,...
> >
> > UNICODE DB:
> >
> > create table t (a char(10));
> > set client_encoding = iso88591;
> > insert into t VALUES ('æøå');
> >
> > select a, octet_length(a),length(a) from t;
> > a | octet_length | length
> > ------------+--------------+--------
> > æøå | 13 | 3
> > (1 row)
> >
> > This is with 8.0.2.
> >
> > Just FYI.
> >
> > ... John
> >
> > > -----Original Message-----
> > > From: pgsql-patches-owner@postgresql.org
> > > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo
> > > Ishii
> > > Sent: Tuesday, May 24, 2005 8:52 AM
> > > To: y-asaba@sra.co.jp
> > > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> > > Subject: Re: [PATCHES] character type value is not padded with
> > > spaces
> > >
> > > Hackers,
> > >
> > > The problem he found is not only existing in Japanese
> characters but
> > > also in any multibyte encodings including UTF-8. For me the patch
> > > looks good and I will commit it to 7.3, 7.4, 8.0 stables
> and current
> > > if there's no objection.
> > > --
> > > Tatsuo Ishii
> > >
> > > > Character type value including multibyte characters is not
> > > padded with
> > > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> > > >
> > > > create table t (a char(10));
> > > > insert into t values ('XXXXX'); -- X is 2byte character.
> > > >
> > > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
> > > >
> > > > select a, octed_length(a) from t;
> > > >
> > > > a | octet_length
> > > > -------+--------------
> > > > XXXXX | 10
> > > >
> > > > If padded with spaces, octet_length(a) is 15. This problem
> > > is caused
> > > > that string length is calculated by byte length(VARSIZE) in
> > > > exprTypmod().
> > > >
> > > > I attache the patch for this problem.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Yoshiyuki Asaba
> > > > y-asaba@sra.co.jp
> > >
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 2: you can get off all lists at once with the
> unregister command
> > > (send "unregister YourEmailAddressHere" to
> > > majordomo@postgresql.org)
> > >
> > >
> >
>
>
I see Tatsuo already applied this, which is great. I added a little
comment:
/* if multi-byte, take len and find # characters */
---------------------------------------------------------------------------
Yoshiyuki Asaba wrote:
> Character type value including multibyte characters is not padded
> with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
>
> create table t (a char(10));
> insert into t values ('XXXXX'); -- X is 2byte character.
>
> I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted.
>
> select a, octed_length(a) from t;
>
> a | octet_length
> -------+--------------
> XXXXX | 10
>
> If padded with spaces, octet_length(a) is 15. This problem is caused
> that string length is calculated by byte length(VARSIZE) in
> exprTypmod().
>
> I attache the patch for this problem.
>
> Regards,
>
> --
> Yoshiyuki Asaba
> y-asaba@sra.co.jp
> *** parse_expr.c.orig 2005-01-13 02:32:36.000000000 +0900
> --- parse_expr.c 2005-05-22 17:12:37.000000000 +0900
> ***************
> *** 18,23 ****
> --- 18,24 ----
> #include "catalog/pg_operator.h"
> #include "catalog/pg_proc.h"
> #include "commands/dbcommands.h"
> + #include "mb/pg_wchar.h"
> #include "miscadmin.h"
> #include "nodes/makefuncs.h"
> #include "nodes/params.h"
> ***************
> *** 34,40 ****
> #include "utils/lsyscache.h"
> #include "utils/syscache.h"
>
> -
> bool Transform_null_equals = false;
>
> static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref);
> --- 35,40 ----
> ***************
> *** 1491,1497 ****
> {
> case BPCHAROID:
> if (!con->constisnull)
> ! return VARSIZE(DatumGetPointer(con->constvalue));
> break;
> default:
> break;
> --- 1491,1503 ----
> {
> case BPCHAROID:
> if (!con->constisnull)
> ! {
> ! int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ;
> !
> ! if (pg_database_encoding_max_length() > 1)
> ! len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len);
> ! return len + VARHDRSZ;
> ! }
> break;
> default:
> break;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073