Обсуждение: Case Sensitivity
Hi,
I'm moving over from MS SQL. I've been googling this for ages now and suprisingly cannot find a clear answer.
I want my data in tables to be case insensitive.
This is so i can:
1. Put keys on natural key fields, for instance a product part number. I product number 'ABC123' inserted i need to disallow 'abc123' to be inserted as a second row. Please don't tell me i have to add another column holding a lowered version of the product number. My database is littered with this need and i would end up bloating my table schema.
2. I need to query case insensitively. SELECT * FROM product WHERE product_number = 'ABC123' should return the same row as SELECT * FROM product WHERE product_number = 'abc123'
Is there a database wide collation setting i can make, or a case insensitive character type. There are lots of online posts regarding using LOWER function for querying. This is a workaround for point (2) but does not remedy point (1) above.
Many thanks
Phillip
Phillip Smith
Hello you should to use a citext datatype http://www.postgresql.org/docs/9.0/interactive/citext.html regards Pavel Stehule 2011/1/13 Phillip Smith <phillip@softworks.com>: > Hi, > I'm moving over from MS SQL. I've been googling this for ages now and > suprisingly cannot find a clear answer. > > I want my data in tables to be case insensitive. > > This is so i can: > 1. Put keys on natural key fields, for instance a product part number. I > product number 'ABC123' inserted i need to disallow 'abc123' to be inserted > as a second row. Please don't tell me i have to add another column holding a > lowered version of the product number. My database is littered with this > need and i would end up bloating my table schema. > > 2. I need to query case insensitively. SELECT * FROM product WHERE > product_number = 'ABC123' should return the same row as SELECT * FROM > product WHERE product_number = 'abc123' > > Is there a database wide collation setting i can make, or a case > insensitive character type. There are lots of online posts regarding using > LOWER function for querying. This is a workaround for point (2) but does not > remedy point (1) above. > > Many thanks > > Phillip > > Phillip Smith >
> you should to use a citext datatype > > http://www.postgresql.org/docs/9.0/interactive/citext.html Or: - use a functional index with lower() to ensure uniqueness - use a BEFORE trigger to lower() the inserted data Karsten -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
Thanks Pavel,
Is this the only way?. I would prefer to use a collation setting if this is possible. Do you know whether there would be a performance hit using non standard character strings?
Phillip
Hi Karsten,
modifying user inputted data is not an option for me. I need to maintain the original data. I had read about indexing on a lower function. This all seems a bit of a bodgy workaround to me. A clean Case Insensitive collation setting is cleaner. Is this supported in postgreSQL 9?
Regards
Phillip
Phillip
2011/1/13 Phillip Smith <phillip@softworks.com>: > Thanks Pavel, > > Is this the only way?. I would prefer to use a collation setting if this is > possible. Do you know whether there would be a performance hit using non > standard character strings? It is preferred way. PostgreSQL doesn't support a collations, and what I know collation in 9.1 will not allow a comparation between case sensitive and case insensitive text, because collation in PostgreSQL is based on unix locale, and there are not case insensitive locale. I don't know about performance problems. It does same work like case insensitive collation. Regards Pavel Stehule > > Phillip > >
On Thu, Jan 13, 2011 at 03:29:03PM -0000, Phillip Smith wrote: > modifying user inputted data is not an option for me. I need to > maintain the original data. I had read about indexing on a lower > function. This all seems a bit of a bodgy workaround to me. A clean > Case Insensitive collation setting is cleaner. Is this supported in > postgreSQL 9? There isn't such a collation setting as far as I know, and it's actually not as clean as you think. It turns out to work reasonably well in ASCII-land, but not very well in other circumstances. In general, the case-insensitive but case-preserving technique that English-speaking computer users have come to find normal is a bad fit for a wide variety of languages. (Even "Latin" characterset-using languages have trouble, because of historic ways of handling accents. Is the capital version of é E or É? Well, both, it turns out, depending on whom you believe. I could bore you about the effects of this in the DNS all day long, but I'll resist the temptation.) Putting an index on lower(column) and then doing all your comparisons with lower(datum) works fairly well, and I don't see how it's any more bodgy than a database-wide case insensitive collation. For instance, I can assure you that customers named Leblanc and LeBlanc care about whether those two compare equally. In your customer name field, if you have a database-wide collation setting, you can't make the distinction. A -- Andrew Sullivan ajs@crankycanuck.ca