Re: PGDay.it collation discussion notes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: PGDay.it collation discussion notes
Дата
Msg-id 48FC4F4D.2040403@enterprisedb.com
обсуждение исходный текст
Ответ на Re: PGDay.it collation discussion notes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PGDay.it collation discussion notes  ("Dave Gudeman" <dave.gudeman@gmail.com>)
Список pgsql-hackers
Tom Lane wrote:
> Another objection to this design is that it's completely unclear that
> functions from text to text should necessarily yield the same collation
> that went into them, but if you treat collation as a hard-wired part of
> the expression syntax tree you aren't going to be able to do anything else.
> (What will you do about functions/operators taking more than one text
> argument?)

Whatever the spec says. Collation is intimately associated with the 
comparison operations, and doesn't make any sense anywhere else. The way 
the default collation for a given operation is determined, by bubbling 
up the collation from the operands, through function calls and other 
expressions, is just to make life a bit easier for the developer who's 
writing the SQL. We could demand that you always explicitly specify a 
collation when you use the text equality or inequality operators, but 
because that would be quite tiresome, a reasonable default is derived 
from the context.

I believe the spec stipulates how that default is derived, so I don't 
think we need to fret over it. We'll need it eventually, but the parser 
changes is not the critical part. We can start off by deriving the 
collation from a GUC variable, for example.

> I think it would be better to treat the collation indicator as part of
> string *values* and let it bubble up through expressions that way.
> The "expr COLLATE ident" syntax would be a simple run-time operation
> that pokes a new collation into a string value.  The notion of a column
> having a particular collation would then amount to a check constraint on
> the values going into the column.

Looking at an individual value, collation just doesn't make sense.
Collation is property of the comparison operation, not of a value.

In the parser, we might have to do something like that though, because 
according to the standard you can tack the COLLATION keyword to string 
constants and have it bubble up. But let's keep that ugliness just 
inside the parser.

One, impractical, way to implement collation would be to have one 
operator class per collation. In fact you could do that today, with no 
backend changes, to support multiple collations. It's totally 
impractical, because for starters you'd need different comparison 
operators, with different names, for each collation. But it's the right 
mental model.

I think the right approach is to invent a new concept called "operator 
modifier". It's basically a 3rd argument to operators. It can be 
specified explicitly when an operator is used, with syntax like "<left> 
Op <right> USING <modifier>", or in case of collation, it's derived from 
the context, per SQL spec. The operator modifier is tacked on to OpExprs 
and SortClauses in the parser, and passed as a 3rd argument to the 
function implementing the operator at execution time.

When an index is created, if the operators in the operator class take an 
operator modifier, it's stored at creation time into a new column in 
pg_index (needs to be a vector or array to handle multi-column indexes). 
The planner needs to check the modifier when it determines whether an 
index can be used or not.

BTW, this reminds me of the discussions we had about the tsearch default 
configuration. It's different, though, because in full text search, 
there's a separate tsvector data type, and the problem was with 
expression indexes, not regular ones.

Another consideration is LC_CTYPE. Just like we want to support
different collations, we should support different character
classifications for upper()/lower(). We might want to tie it into
collation, as using different ctype and collation doesn't usually make
sense, but it's something to keep in mind.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Wanner
Дата:
Сообщение: Re: Block-level CRC checks
Следующее
От: ITAGAKI Takahiro
Дата:
Сообщение: Re: contrib/pg_stat_statements