Обсуждение: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi
Get rid of USE_WIDE_UPPER_LOWER dependency in trigram construction. contrib/pg_trgm's make_trigrams() was coded to ignore multibyte character boundaries and just make trigrams from bytes if USE_WIDE_UPPER_LOWER wasn't defined. This is a bit odd, since there's no obvious reason why trigram compaction rules should depend on the presence of towlower() and friends. What's more, there was an Assert() that would fail if that code path was fed any multibyte characters. We need to do something about this since the pending regex-indexing patch has an assumption that you get just one "trgm" from any three characters. The best solution seems to be to remove the USE_WIDE_UPPER_LOWER dependency, which shouldn't really have been there in the first place. The second loop in make_trigrams() is now just a fast path and not a potentially incompatible algorithm. If there is anybody still using Postgres on machines without wcstombs() or towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail incorrectly. It seems likely that there are no such installations, though. In passing, rename cnt_trigram to compact_trigram, which seems to better describe its functionality, and improve make_trigrams' test for whether it has to use the slow path or not (per a suggestion from Alexander Korotkov). Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/7844608e54a3a2e3dee461b00fd6ef028a845d7c Modified Files -------------- contrib/pg_trgm/trgm_op.c | 17 ++++++++++------- 1 files changed, 10 insertions(+), 7 deletions(-)
Re: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi
От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes: > If there is anybody still using Postgres on machines without wcstombs() or > towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need > to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail > incorrectly. It seems likely that there are no such installations, though. Those conditions seem just complex enough to require a test script that will check that for you. What if we created a new binary responsible for auto checking all those release-note items that are possible to machine check, then issue a WARNING containing the URL to the release notes you should be reading, and a SQL script (ala pg_upgrade) to run after upgrade? $ pg_checkupgrade -d "connection=string" > upgrade.sql NOTICE: checking 9.3 upgrade release notes WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3 WARNING: TN-93-0012 … WARNING: This script is NOT comprehensive, read release notes at … The target version would be hard coded on the binary itself for easier maintaining of it, and that proposal includes a unique identifier for any release note worthy warning that we know about, that would be included in the output of the program. I think most of the checks would only have to be SQL code, and some of them should include running some binary code the server side. When that's possible, we could maybe expose that binary code in a server side extension so as to make the client side binary life's easier. That would also be an excuse for the project to install some upgrade material on the old server, which has been discussed in the past for preparing pg_upgrade when we have a page format change. What do you think? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi
От
Stefan Kaltenbrunner
Дата:
On 04/08/2013 10:11 AM, Dimitri Fontaine wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> If there is anybody still using Postgres on machines without wcstombs() or >> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need >> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail >> incorrectly. It seems likely that there are no such installations, though. > > Those conditions seem just complex enough to require a test script that > will check that for you. What if we created a new binary responsible for > auto checking all those release-note items that are possible to machine > check, then issue a WARNING containing the URL to the release notes you > should be reading, and a SQL script (ala pg_upgrade) to run after > upgrade? > > $ pg_checkupgrade -d "connection=string" > upgrade.sql > NOTICE: checking 9.3 upgrade release notes > WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3 > WARNING: TN-93-0012 … > WARNING: This script is NOT comprehensive, read release notes at … > > The target version would be hard coded on the binary itself for easier > maintaining of it, and that proposal includes a unique identifier for > any release note worthy warning that we know about, that would be > included in the output of the program. > > I think most of the checks would only have to be SQL code, and some of > them should include running some binary code the server side. When > that's possible, we could maybe expose that binary code in a server side > extension so as to make the client side binary life's easier. That would > also be an excuse for the project to install some upgrade material on > the old server, which has been discussed in the past for preparing > pg_upgrade when we have a page format change. given something like this also will have to be dealt with by pg_upgrade, why not fold it into that (like into -c) completly and recommend running that? on the flipside if people don't read the release notes they will also not run any kind of binary/script mentioned there... Stefan