PGDay.it collation discussion notes
От | Gregory Stark |
---|---|
Тема | PGDay.it collation discussion notes |
Дата | |
Msg-id | 87abd21k3b.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответы |
Re: PGDay.it collation discussion notes
(Martijn van Oosterhout <kleptog@svana.org>)
|
Список | pgsql-hackers |
Radek, Zdenek, and Heikki had an extended discussion here at PGDay.it on collation support. I was volunteered to be the note-taker (!). Here is the plan we came up with: Firstly, the ANSI standard makes collations into schema-qualified names which pretty much forces us to have the collation catalog table. The ANSI standard syntax where the COLLATION keyword can follow just about any string value in an expression and then bubbles up the expression until an operation needs to pick a collation seemed very weird to us. Hooking that into the parser was pretty intimidating but we thought we could postpone that till later and solve the meat of the problem first. Instead, at least for now, we'll have a guc variable which decides the current collation. Eventually that would just be the default if the query (or column or whatever) doesn't override it with an explicitly selected collation. (This would mean we need to treat the default collation like search_path for things like plpgsql cached plans etc. Perhaps we could have a flag to indicate whether plans actually need to be invalidated or are collation-independent) In a query the current collation would be grabbed by the planner when it's deciding on operators. The planner would grab that collation oid and track it in the sort key along with the nulls first/last and asc/desc properties. It's important it do this immediately in the planner rather than just reference the guc later because eventually it needs to possible to override it at this point and also so that if you change the guc during the query execution (ie, in a function) it would not affect the execution of your current query. pg_index would need an additional column to go along with indkey and indoptions. It would be an oidvector of collation oids. When planning a query or building an index we need to determine whether the opclass is collation-aware or not. It seemed best that we not hard code which data types are collation aware and instead look at either the operator or the pg_proc entry for the function implementing the sort operator or perhaps the btproc for the opclass to see if it takes a third argument. Or perhaps we would have a boolean column in pg_opclass which asserts they do and that it should be the collation. If so then the collation is stored in the sort key and the indcollation slots and is passed to the operator or the btproc function as the third argument. We think even without the parser changes this would be useful enough to take on its own. It would allow having indexes built in different collations and have different sessions use different collations. You could even hack queries which combine collations by defining a function handle subparts of the query and attach the guc the same way we attach the search_path to security definer functions. To finish the feature the catalog needs to add a default collation to every column and some other database objects. Then the parser needs to check those objects first before falling back to the session variable. Then it needs to bubble that value up as it builds expressions so that it's available at the comparison operator or sort node. We didn't read the part of the spec which covered the rules for this but it does have rules which should turn up if you search for "collation derivation". -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
В списке pgsql-hackers по дате отправления: