Re: Collations and Replication; Next Steps

Поиск
Список
Период
Сортировка
От Matthew Kelly
Тема Re: Collations and Replication; Next Steps
Дата
Msg-id 76A634FB-0BEC-4FCF-AC9C-B6EA2C50C290@tripadvisor.com
обсуждение исходный текст
Ответ на Re: Collations and Replication; Next Steps  (Martijn van Oosterhout <kleptog@svana.org>)
Ответы Re: Collations and Replication; Next Steps  (Robert Haas <robertmhaas@gmail.com>)
Re: Collations and Replication; Next Steps  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Collations and Replication; Next Steps  (Peter Eisentraut <peter_e@gmx.net>)
Re: Collations and Replication; Next Steps  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Here is where I think the timezone and PostGIS cases are fundamentally different:
I can pretty easily make sure that all my servers run in the same timezone.  That's just good practice.  I'm also going
toinstall the same version of PostGIS everywhere in a cluster.  I'll build PostGIS and its dependencies from the exact
samesource files, regardless of when I build the machine. 

Timezone is a user level setting; PostGIS is a user level library used by a subset.

glibc is a system level library, and text is a core data type, however.  Changing versions to something that doesn't
matchthe kernel can lead to system level instability, broken linkers, etc.  (I know because I tried).  Here are some
subtleother problems that fall out: 
* Upgrading glibc, the kernel, and linker through the package manager in order to get security updates can cause the
corruption.*A basebackup that is taken in production and placed on a backup server might not be valid on that server,
oryour desktop machine, or on the spare you keep to do PITR when someone screws up.* Unless you keep _all_ of your
clusterson the same OS, machines from your database spare pool probably won't be the right OS when you add them to the
clusterbecause a member failed. 

Keep in mind here, by OS I mean CentOS versions.  (we're running a mix of late 5.x and 6.x, because of our numerous
issueswith the 6.x kernel) 

The problem with LC_IDENTIFICATION is that every machine I have seen reports revision "1.0", date "2000-06-24".  It
doesn'tseem like the versioning is being actively maintained. 

I'm with Martjin here, lets go ICU, if only because it moves sorting to a user level library, instead of a system
level. Martjin do you have a link to the out of tree patch?  If not I'll find it.  I'd like to apply it to a branch and
startplaying with it. 

- Matt K


On Sep 17, 2014, at 7:39 AM, Martijn van Oosterhout <kleptog@svana.org>wrote:

> On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote:
>> On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> Clearly, this is worth documenting, but I don't think we can completely
>>> prevent the problem.  There has been talk of a built-in index integrity
>>> checking tool.  That would be quite useful.
>>
>> We could at least use the GNU facility for versioning collations where
>> available, LC_IDENTIFICATION [1]. By not versioning collations, we are
>> going against the express advice of the Unicode consortium (they also
>> advise to do a strcmp() tie-breaker, something that I think we
>> independently discovered in 2005, because of a bug report - this is
>> what I like to call "the Hungarian issue". They know what our
>> constraints are.). I recognize it's a tricky problem, because of our
>> historic dependence on OS collations, but I think we should definitely
>> do something. That said, I'm not volunteering for the task, because I
>> don't have time. While I'm not sure of what the long term solution
>> should be, it *is not* okay that we don't version collations. I think
>> that even the best possible B-Tree check tool is a not a solution.
>
> Personally I think we should just support ICU as an option. FreeBSD has
> been maintaining an out of tree patch for 10 years now so we know it
> works.
>
> The FreeBSD patch is not optimal though, these days ICU supports UTF-8
> directly so many of the push-ups FreeBSD does are no longer necessary.
> It is often faster than glibc and the key sizes for strxfrm are more
> compact [1] which is relevent for the recent optimisation patch.
>
> Lets solve this problem for once and for all.
>
> [1] http://site.icu-project.org/charts/collation-icu4c48-glibc
>
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
>> He who writes carelessly confesses thereby at the very outset that he does
>> not attach much importance to his own thoughts.
>   -- Arthur Schopenhauer




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Collations and Replication; Next Steps
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Collations and Replication; Next Steps