Обсуждение: Changes to pg_dump/psql following collation "C" in the catalog

Поиск
Список
Период
Сортировка

Changes to pg_dump/psql following collation "C" in the catalog

От
"Daniel Verite"
Дата:
 Hi,

One consequence of using the "C" collation in the catalog versus
the db collation is that pg_dump -t with a regexp may not find
the same tables as before. It happens when these conditions are
all met:
- the collation of the database is not "C"
- the regexp has locale-dependant parts
- the names to match include characters that are sensitive to
locale-dependant matching

For instance a table named "rapport_journée_12" in an fr_FR.UTF-8 db
used to be found by pg_dump -t 'rapport_\w+_\d+', and is now missed
in the devel version.

It seems that to fix that, we could qualify the references to columns such
as "relname" and "schema_name" with COLLATE "default" clauses in the
queries that use pattern-matching in client-side tools, AFAICS
pg_dump and psql.

Before going any further with this idea, is there agreement that it's an
issue to address and does this look like the best way to do that?


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Changes to pg_dump/psql following collation "C" in the catalog

От
Tom Lane
Дата:
"Daniel Verite" <daniel@manitou-mail.org> writes:
> One consequence of using the "C" collation in the catalog versus
> the db collation is that pg_dump -t with a regexp may not find
> the same tables as before. It happens when these conditions are
> all met:
> - the collation of the database is not "C"
> - the regexp has locale-dependant parts
> - the names to match include characters that are sensitive to
> locale-dependant matching

Hm, interesting.

> It seems that to fix that, we could qualify the references to columns such
> as "relname" and "schema_name" with COLLATE "default" clauses in the
> queries that use pattern-matching in client-side tools, AFAICS
> pg_dump and psql.

Seems reasonable.  I was initially worried that this might interfere with
query optimization, but some experimentation says that the planner
successfully derives prefix index clauses anyway (which is correct,
because matching a fixed regex prefix doesn't depend on locale).

It might be better to attach the COLLATE clause to the pattern constant
instead of the column name; that'd be less likely to break if sent to
an older server.

> Before going any further with this idea, is there agreement that it's an
> issue to address and does this look like the best way to do that?

That is a question worth asking.  We're going to be forcing people to get
used to this when working directly in SQL, so I don't know if masking it
in a subset of tools is really a win or not.

            regards, tom lane


Re: Changes to pg_dump/psql following collation "C" in the catalog

От
"Daniel Verite"
Дата:
    Tom Lane wrote:

> "Daniel Verite" <daniel@manitou-mail.org> writes:
> > One consequence of using the "C" collation in the catalog versus
> > the db collation is that pg_dump -t with a regexp may not find
> > the same tables as before. It happens when these conditions are
> > all met:
> > - the collation of the database is not "C"
> > - the regexp has locale-dependant parts
> > - the names to match include characters that are sensitive to
> > locale-dependant matching
>
> Hm, interesting.
>
> > It seems that to fix that, we could qualify the references to columns such
> > as "relname" and "schema_name" with COLLATE "default" clauses in the
> > queries that use pattern-matching in client-side tools, AFAICS
> > pg_dump and psql.
>
> Seems reasonable.  I was initially worried that this might interfere with
> query optimization, but some experimentation says that the planner
> successfully derives prefix index clauses anyway (which is correct,
> because matching a fixed regex prefix doesn't depend on locale).
>
> It might be better to attach the COLLATE clause to the pattern constant
> instead of the column name; that'd be less likely to break if sent to
> an older server.
>
> > Before going any further with this idea, is there agreement that it's an
> > issue to address and does this look like the best way to do that?
>
> That is a question worth asking.  We're going to be forcing people to get
> used to this when working directly in SQL, so I don't know if masking it
> in a subset of tools is really a win or not.

I think psql and pg_dump need to adjust, just like the 3rd party tools
will, at least those that support collation-aware search in the catalog.
PFA a patch that implements the slight changes needed.

I'll add an entry for it in the next CF.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Вложения

Re: Changes to pg_dump/psql following collation "C" in the catalog

От
Chapman Flack
Дата:
>> "Daniel Verite" <daniel@manitou-mail.org> writes:
>>> One consequence of using the "C" collation in the catalog versus
>>> the db collation

As an intrigued Person Following At Home, I was happy when I found
this little three-message thread had more context in [1] and [2]. :)

-Chap

[1] https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
[2] https://postgr.es/m/5978.1544030694@sss.pgh.pa.us



Re: Changes to pg_dump/psql following collation "C" in the catalog

От
Tom Lane
Дата:
"Daniel Verite" <daniel@manitou-mail.org> writes:
> I think psql and pg_dump need to adjust, just like the 3rd party tools
> will, at least those that support collation-aware search in the catalog.
> PFA a patch that implements the slight changes needed.
> I'll add an entry for it in the next CF.

Hm, if that's as much as we have to touch, I think there's a good
argument for squeezing it into v12 rather than waiting.  The point
here is mostly to avoid a behavior change from pre-v12, but if we
allow v12 to have a different behavior, it's questionable whether
we'd want v13 to change it back.

Just looking at the patch, I wonder whether it doesn't need some
server-version checks.  At the very least this would break with
pre-9.1 servers, which lack COLLATE altogether.

            regards, tom lane



Re: Changes to pg_dump/psql following collation "C" in the catalog

От
"Daniel Verite"
Дата:
    Chapman Flack wrote:

> >> "Daniel Verite" <daniel@manitou-mail.org> writes:
> >>> One consequence of using the "C" collation in the catalog versus
> >>> the db collation
>
> As an intrigued Person Following At Home, I was happy when I found
> this little three-message thread had more context in [1] and [2]. :)
>
> -Chap
>
> [1] https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
> [2] https://postgr.es/m/5978.1544030694@sss.pgh.pa.us

Yes. The concrete issue that the patch addresses can be illustrated
with this example:

  psql (12devel)
  Type "help" for help.

  postgres=# show lc_collate ;
   lc_collate
  -------------
   fr_FR.UTF-8
  (1 row)

  postgres=# create table année_2018();
  CREATE TABLE

  postgres=# \dt '\\w+_\\d+'
  psql: error: Did not find any relation named "\w+_\d+".

In previous versions it would have found the table with the accent
in the name. With 12devel it doesn't, because the match is done with
the collation of the column, now "C", which does not consider the
accented character to be a letter.
This also affects pg_dump with the -t and -n switches that
accept patterns and I think pretty much all \d* commands
that accept patterns too.

The purpose of the fix is for the patterns to give the same results as
before. It's done by simply adding explicit collate clauses to use the
collation of the database for these comparisons.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Changes to pg_dump/psql following collation "C" in the catalog

От
"Daniel Verite"
Дата:
    Tom Lane wrote:

> Hm, if that's as much as we have to touch, I think there's a good
> argument for squeezing it into v12 rather than waiting.  The point
> here is mostly to avoid a behavior change from pre-v12

Yes. I was mentioning the next CF because ISTM that nowadays
non-committers are expected to file patches in there, committers
picking up patches both in the current and next CF based on
their evaluation of priorities.
But if you plan to process this one shortly, a CF entry is probably
superfluous.

> Just looking at the patch, I wonder whether it doesn't need some
> server-version checks.  At the very least this would break with
> pre-9.1 servers, which lack COLLATE altogether.

PFA a new version adding the clause for only 12 and up, since the
previous versions are not concerned, and as you mention, really old
versions would fail otherwise.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Вложения

Re: Changes to pg_dump/psql following collation "C" in the catalog

От
Tom Lane
Дата:
"Daniel Verite" <daniel@manitou-mail.org> writes:
>> Hm, if that's as much as we have to touch, I think there's a good
>> argument for squeezing it into v12 rather than waiting.  The point
>> here is mostly to avoid a behavior change from pre-v12

> Yes. I was mentioning the next CF because ISTM that nowadays
> non-committers are expected to file patches in there, committers
> picking up patches both in the current and next CF based on
> their evaluation of priorities.

Yeah, it's good practice to make a CF entry to ensure the patch doesn't
slip through the cracks.  There's an awful lot of traffic on pgsql-hackers
these days...

>> Just looking at the patch, I wonder whether it doesn't need some
>> server-version checks.  At the very least this would break with
>> pre-9.1 servers, which lack COLLATE altogether.

> PFA a new version adding the clause for only 12 and up, since the
> previous versions are not concerned, and as you mention, really old
> versions would fail otherwise.

Pushed with some fiddling with the comments, and changing the collation
names to be schema-qualified for paranoia's sake.

            regards, tom lane



Re: Changes to pg_dump/psql following collation "C" in the catalog

От
"Daniel Verite"
Дата:
    Tom Lane wrote:

> > PFA a new version adding the clause for only 12 and up, since the
> > previous versions are not concerned, and as you mention, really old
> > versions would fail otherwise.
>
> Pushed with some fiddling with the comments, and changing the collation
> names to be schema-qualified for paranoia's sake.

Thanks !

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite