Обсуждение: Parser bug (?)

Поиск
Список
Период
Сортировка

Parser bug (?)

От
"Oliver Elphick"
Дата:
Hi Tom,

This looks like a parser bug in 6.4.  I found it while trying to run
pg_upgrade:

bray=> create table junk ("singleton" "char");
CREATE
bray=> drop table junk;
DROP
bray=> create table junk ("singleton" "char" default 'M');
ERROR:  DEFAULT: const type mismatched
bray=> create table junk ("singleton" char default 'M');
CREATE

pg_dump puts all type names in double quotes, which triggers this bug
when pg_upgrade is run, with the result that the upgrade fails.

[Machine is i686 Linux 2.1.125 with glibc 2.0.7u]

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "Go ye therefore, and teach all nations,
baptizingthem     in the name of the Father, and of the Son, and of the      Holy Ghost; Teaching them to observe all
things      whatsoever I have commanded you; and, lo, I am with      you alway, even unto the end of the world. Amen."
             Matthew 28:19,20 
 




Re: [HACKERS] Parser bug (?)

От
Tom Lane
Дата:
"Oliver Elphick" <olly@lfix.co.uk> writes:
> This looks like a parser bug in 6.4.  I found it while trying to run
> pg_upgrade:
> bray=> create table junk ("singleton" "char");
> CREATE
> bray=> drop table junk;
> DROP
> bray=> create table junk ("singleton" "char" default 'M');
> ERROR:  DEFAULT: const type mismatched
> bray=> create table junk ("singleton" char default 'M');
> CREATE

Oh my, that's interesting ... and it looks like it is closely related
to my gripe a few days ago that type char is actually implemented as
char(1), or more accurately type bpchar(1).

I didCREATE TABLE t1 (c1 char, c2 "char");

and then looked at the info for the relation in pg_attribute:

attname|atttypid|attlen|attnum|attnelems|atttypmod|attbyval|attalign
-------+--------+------+------+---------+---------+--------+--------
c1     |    1042|    -1|     1|        0|        5|f       |i       
c2     |      18|     1|     2|        0|       -1|t       |c       

pg_type contains

oid|typname|typowner|typlen|typprtlen|typbyval|typtype|typisdefined|typdelim|typrelid|typelem|typinput|typoutput|typreceive|typsend
|typalign|typdefault
 

----+-------+--------+------+---------+--------+-------+------------+--------+--------+-------+--------+---------+----------+---------+--------+----------
18|char  |     256|     1|        1|t       |b      |t           |,       |       0|      0|charin  |charout  |charin
|charout  |c       |
 
1042|bpchar |     256|    -1|       -1|f       |b      |t           |,       |       0|
18|bpcharin|bpcharout|bpcharin |bpcharout|i       |
 


So it appears that *something* (probably the parser) is converting the
unquoted type name char to a bpchar (blank-padded char array),
but if it's quoted then you actually get the primitive char type.

Seems to me that that's a bug: if you ask for char, and not char(1), you
should get char.  At least I'd like it to be a bug, because I want to be
able to use the unvarnished char type, and I don't want to depend on
some weird quoting behavior to get at it.

The second bug exhibited by Oliver's example is that a literal 'M'
is not recognized as compatible with the primitive char type.
That won't do either.

I tried coercing the 'M' to a char explicitly, just to see what would
happen, and got a coredump:

create table junk ("singleton" "char" default 'M'::char);
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
beforeor while processing the request.
 
We have lost the connection to the backend, so further processing is impossible.  Terminating.

Backtrace is

#0  0x800c4234 in _strlen ()
#1  0xca87c in FlattenStringList (list=0x4008dd58) at gram.y:5047
#2  0xc6380 in yyparse () at gram.y:845
#3  0xcac8c in parser (   str=0x7b034218 "create table junk (\"singleton\" \"char\" default 'M'::char);", typev=0x0,
nargs=0)at parser.c:53
 
#4  0xf1c0c in pg_parse_and_plan (   query_string=0x7b034218 "create table junk (\"singleton\" \"char\" default
'M'::char);",typev=0x7b038550, nargs=0, queryListP=0x7b036358, dest=Remote,   aclOverride=0) at postgres.c:420
 

Ooops.  Looks like FlattenStringList is getting handed a node type it
was not expecting.  The elements of the list it is handed are:

$6 = {type = T_String, val = {str = 0x41b6c "CAST", ival = 269164,   dval = 5.7116487507937656e-309}}
$8 = {type = T_String, val = {str = 0x4008d910 "'M'", ival = 1074321680,   dval = 3.105987548828125}}
$10 = {type = T_String, val = {str = 0x41b74 "AS", ival = 269172,   dval = 5.7118185104570428e-309}}
$12 = {type = T_TypeName, val = {str = 0x5 <Address 0x5 out of bounds>,   ival = 5, dval = 1.0609978954826362e-313}}

So that's another bug, though possibly unrelated.
        regards, tom lane


Re: [HACKERS] Parser bug (?)

От
"Thomas G. Lockhart"
Дата:
> Oh my, that's interesting ... and it looks like it is closely related
> to my gripe a few days ago that type char is actually implemented as
> char(1), or more accurately type bpchar(1).

That's a feature! Sorry I missed your gripe session, but it apparently
flew by in the e-mail and I didn't spot it. Do you want to dredge it up
again, or did one gripe get you feeling better?

> So it appears that *something* (probably the parser) is converting the
> unquoted type name char to a bpchar (blank-padded char array),
> but if it's quoted then you actually get the primitive char type.
> Seems to me that that's a bug: if you ask for char, and not char(1), 
> you should get char.

I'm pretty sure that any benefit to a visible "unvarnished char" are
outweighed by the burden of fully implementing all of the "uvchar"
support functions. It would need to be entirely transparent when mixed
with char(), varchar(), and text types, and it probably isn't at the
moment.

As a reminder to others (as I'm sure you are already aware of this, but
want the feature anyway :), an apparent space savings of 4 bytes (1
bytes vs 5 bytes) for char vs char(1) is not the 80% savings it appears;
there is a large per-tuple overhead that pretty much swamps any tiny
data type.

> The second bug exhibited by Oliver's example is that a literal 'M'
> is not recognized as compatible with the primitive char type.
> That won't do either.
> I tried coercing the 'M' to a char explicitly, just to see what would
> happen, and got a coredump:
> So that's another bug, though possibly unrelated.

We should continue the discussion before I try for a fix, to make sure
that you don't get railroaded into *never* being able to use "uvchar" ;)
                    - Tom


Re: [HACKERS] Parser bug (?)

От
Tom Lane
Дата:
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
>> ... my gripe a few days ago that type char is actually implemented as
>> char(1), or more accurately type bpchar(1).

> That's a feature! Sorry I missed your gripe session, but it apparently
> flew by in the e-mail and I didn't spot it.

I think I sent it to pgsql-sql not the hackers list.

> Do you want to dredge it up again, or did one gripe get you feeling
> better?

No, I'm still unhappy.

Basically, what I'm using "plain char" for is as a poor man's
enumerated type.  I have a lot of fields that have only half a dozen
legal values, and what I do is assign somewhat-mnemonic character
codes to each of the allowed values.  This is compact, easy to process,
and I can examine the raw data at need without straining my tired
brain cells very far to remember what the codes stand for.  (It'd be
even better to have a *real* enumerated-type facility, I suppose,
but this works fine and doesn't take a major amount of work to
implement...)

Only problem is that the fields aren't nearly as compact as I
thought they were.

> As a reminder to others (as I'm sure you are already aware of this, but
> want the feature anyway :), an apparent space savings of 4 bytes (1
> bytes vs 5 bytes) for char vs char(1) is not the 80% savings it appears;

You forgot the 4-byte alignment requirement of char(n).  These things
really take 8 bytes each, not the 1 byte each that several "uvchar"s
in a row would take.

> there is a large per-tuple overhead that pretty much swamps any tiny
> data type.

When you have half a dozen of these per tuple (which I do), it starts
to add up.  Besides, the per-tuple overhead has redeeming social
value --- all those fields have a defensible reason for being there.
A 700% storage overhead with no defensible purpose is harder to
swallow.

> I'm pretty sure that any benefit to a visible "unvarnished char" are
> outweighed by the burden of fully implementing all of the "uvchar"
> support functions. It would need to be entirely transparent when mixed
> with char(), varchar(), and text types, and it probably isn't at the
> moment.

Well, all I really give a darn about is whether I can insert, update,
and select on equality.  (Indeed, since I'm using these as enums,
I have no need whatever for them to be transparently interchangeable
with char-string types.  However, I do want them to display as "char"
and not "int1", since small integer codes would lose the mnemonic
aspect...)  A quick look at pg_operator.h indicates that char is
reasonably well supplied with functions, anyway.

Once I found out that I could get to "uvchar" with this little quoting
hack, I satisfied myself that it does everything I need done.  But I
don't care for relying on what is presumably just a parser bug to get
at a fundamental data type.

I didn't object to removing char2, char4, and the other fixed-width
special character types, but I think that plain char has applications
that justify leaving it in there.  I'm willing to consider giving
it a different name if you really insist that char should be char(1)
(though I don't agree with that).
        regards, tom lane


Re: [HACKERS] Parser bug (?)

От
"Thomas G. Lockhart"
Дата:
All points are well taken, but I will object to the "700%" number. Still
misleading imho. But that's not the point...

> I didn't object to removing char2, char4, and the other fixed-width
> special character types, but I think that plain char has applications
> that justify leaving it in there.  I'm willing to consider giving
> it a different name if you really insist that char should be char(1)
> (though I don't agree with that).

The SQL92 standard sez that char is shorthand for char(1). Trying to
make the behavior of char vs char() *completely* transparent as called
for in the standard is one reason why I squashed them together from a
user's point of view. I only left char in the mix because it is used
internally in Postgres system tables and I didn't want to open that can
of worms (and don't intend to, so don't panic).

I shouldn't hide behind the SQL92 standards argument, since there are
some parts of the standard which are so hideous that they should be
ignored. But I'm not certain this is one of them.

Another detail: we would need to figure out how to do locale and
multibyte support for this "char" type to allow equivalence with
multibyte char(1). Not sure how to do that since this type *is* used
internally and probably can't be resized like that.

Speaking of which, are you or Bruce (or anyone else) thinking of testing
the unsigned int vs size_t for the Size typedef recently mentioned by
Oliver? Can we count on all platforms to have size_t defined? If so, we
could consider adding it in for v6.4.1. I'm currently working on getting
min() and max() to work with string types, but could drop that to polish
things for a v6.4.1 release.
                      - Tom


Re: [HACKERS] Parser bug (?)

От
Tom Lane
Дата:
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
> All points are well taken, but I will object to the "700%" number. Still
> misleading imho.

OK, you understated for effect, I overstated for effect, we're even ;-)

> The SQL92 standard sez that char is shorthand for char(1).

Um.  Well, what do you think of calling it "char1" or some such?

> I only left char in the mix because it is used
> internally in Postgres system tables and I didn't want to open that can
> of worms (and don't intend to, so don't panic).

Now that you mention it, I've noticed a few places where plain char is
used in the system tables in *exactly* the way I'm talking about, ie,
as a simple form of enumerated type.  For example, the typtype and
typalign fields in pg_type.

All I'm asking for is a reliable way to get at that same functionality
in a user table.

> Another detail: we would need to figure out how to do locale and
> multibyte support for this "char" type to allow equivalence with
> multibyte char(1). Not sure how to do that since this type *is* used
> internally and probably can't be resized like that.

Urgh.  Multibyte char support is the *last* thing I'm looking for
in this context.  How about we rename the type
"justoneplainvanillaasciichar" and have done with it ;-) ?


[ caution, topic drift ahead ]

> Speaking of which, are you or Bruce (or anyone else) thinking of testing
> the unsigned int vs size_t for the Size typedef recently mentioned by
> Oliver? Can we count on all platforms to have size_t defined?

I was looking at that.  size_t exists on all platforms, the trouble
is that pre-ANSI platforms are not very consistent about exactly
which system header file(s) define it.  If we make c.h (and c.h, not
postgres.h, is what defines Size) depend on size_t then we may find
a few compilation failures due to .c files not pulling in all the
right system headers before including c.h.

On the other hand, c.h unconditionally requires <stdlib.h> which
itself is an ANSI-ism.  And stdlib is one of the headers that ANSI
specifies to define size_t --- so any ANSI-compliant header fileset
*will* support this change.  It's only not-quite-ANSI systems that
we risk problems with here.  So probably I'm being too conservative
to worry at all.  If you don't have a reasonably ANSI-conformant
compiler and header fileset you're gonna have a heck of an
unpleasant time building Postgres anyway, I suspect.

c.h says that Size is intended to represent the result type of
sizeof, and that most certainly is size_t, *not* any other type,
according to the ANSI spec.  So if Size is being used in the
code to represent the size of in-memory objects then it definitely
ought to be size_t.

In theory this change is absolutely correct, and my guess is we
should do it.  But if we see a few glitches on obsolete platforms,
don't say I didn't warn you ;-).
        regards, tom lane


Re: [HACKERS] Parser bug (?)t

От
Bruce Momjian
Дата:
> Speaking of which, are you or Bruce (or anyone else) thinking of testing
> the unsigned int vs size_t for the Size typedef recently mentioned by
> Oliver? Can we count on all platforms to have size_t defined? If so, we
> could consider adding it in for v6.4.1. I'm currently working on getting
> min() and max() to work with string types, but could drop that to polish
> things for a v6.4.1 release.

I think everyone has size_t. It is s POSIX requirement.  Sounds like a
good change.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Parser bug (?)t

От
"Thomas G. Lockhart"
Дата:
> I think everyone has size_t. It is s POSIX requirement.  Sounds like a
> good change.

Tom Lane points out that some of our more esoteric platforms may be
missing it, so it sounds like it should be an optional v6.4.1 patch and
a v6.5 built-in feature (to allow platform testing). Who is testing now?
              - Tom


Re: [HACKERS] Parser bug (?)

От
Bruce Momjian
Дата:
Is this fixed?


> Hi Tom,
> 
> This looks like a parser bug in 6.4.  I found it while trying to run
> pg_upgrade:
> 
> bray=> create table junk ("singleton" "char");
> CREATE
> bray=> drop table junk;
> DROP
> bray=> create table junk ("singleton" "char" default 'M');
> ERROR:  DEFAULT: const type mismatched
> bray=> create table junk ("singleton" char default 'M');
> CREATE
> 
> pg_dump puts all type names in double quotes, which triggers this bug
> when pg_upgrade is run, with the result that the upgrade fails.
> 
> [Machine is i686 Linux 2.1.125 with glibc 2.0.7u]
> 
> -- 
> Oliver Elphick                                Oliver.Elphick@lfix.co.uk
> Isle of Wight                              http://www.lfix.co.uk/oliver
>                PGP key from public servers; key ID 32B8FAA1
>                  ========================================
>      "Go ye therefore, and teach all nations, baptizing them
>       in the name of the Father, and of the Son, and of the 
>       Holy Ghost; Teaching them to observe all things  
>       whatsoever I have commanded you; and, lo, I am with 
>       you alway, even unto the end of the world. Amen."     
>             Matthew 28:19,20 
> 
> 
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Parser bug (?)

От
Bruce Momjian
Дата:
> Speaking of which, are you or Bruce (or anyone else) thinking of testing
> the unsigned int vs size_t for the Size typedef recently mentioned by
> Oliver? Can we count on all platforms to have size_t defined? If so, we
> could consider adding it in for v6.4.1. I'm currently working on getting
> min() and max() to work with string types, but could drop that to polish
> things for a v6.4.1 release.

Added to current tree.  6.4.1 tree too likely to break.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Parser bug (?)

От
"Thomas G. Lockhart"
Дата:
> Is this fixed?

No, not yet. Put it on the "show stopper" list for v6.4.1...

The explanation is that CHAR is a "keyword" so is passed from the
scanner to the parser explicitly, which then makes sure that the various
legal forms of char type declarations get translated into the internal
bpchar type. 

However, tokens surrounded by double quotes are passed as IDENTs, which
bypasses this mechanism completely.

I'll look at modifying pg_dump to leave out the double quotes on type
fields unless there is mixed or upper case. Also, I'll look at having
some automatic conversions between bpchar and char (though this might
not be available for v6.4.1 since it may require catalog changes).
                 - Tom

> > This looks like a parser bug in 6.4.  I found it while trying to run
> > pg_upgrade:
> > bray=> create table junk ("singleton" "char" default 'M');
> > ERROR:  DEFAULT: const type mismatched
> > pg_dump puts all type names in double quotes, which triggers this 
> > bug when pg_upgrade is run, with the result that the upgrade fails.