Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Дата
Msg-id 2438715.1632510693@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Ответы Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Список pgsql-hackers
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> One thing I noticed is that collatable operators/functions sent to the
> remote might also cause an unexpected result when the default
> collations are not compatible.  Consider this example (even with your
> patch):
> ...
> where ft1 is a foreign table with an integer column c1.  As shown
> above, the sort using the collatable function chr() is performed
> remotely, so the select query might produce the result in an
> unexpected sort order when the default collations are not compatible.

I don't think there's anything really new there --- it's still assuming
that COLLATE "default" means the same locally and remotely.

As a short-term answer, I propose that we apply (and back-patch) the
attached documentation changes.

Longer-term, it seems like we really have to be able to represent
the notion of a remote column that has an "unknown" collation (that
is, one that doesn't match any local collation, or at least is not
known to do so).  My previous patch essentially makes "default" act
that way, but conflating "unknown" with "default" has too many
downsides.  A rough sketch for making this happen is:

1. Create a built-in "unknown" entry in pg_collation.  Insert some
hack or other to prevent this from being applied to any real, local
column; but allow foreign-table columns to have it.

2. Apply mods, probably fairly similar to my patch, that prevent
postgres_fdw from believing that "unknown" matches any local
collation.  (Hm, actually maybe no special code change will be
needed here, once "unknown" has its own OID?)

3. Change postgresImportForeignSchema so that it can substitute
the "unknown" collation at need.  The exact rules for this could
be debated depending on whether you'd rather prioritize safety or
ease-of-use, but I think at least we should use "unknown" whenever
import_collate is turned off.  Perhaps there should be an option
to substitute it for remote "default" as well.  (Further down the
road, perhaps that could be generalized to allow a user-controlled
mapping from remote to local collations.)

Anyway, I think I should withdraw the upthread patch; we don't
want to go that way.

            regards, tom lane

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bf95da9721..dbc11694a0 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -95,8 +95,8 @@
   referenced columns of the remote table.  Although <filename>postgres_fdw</filename>
   is currently rather forgiving about performing data type conversions at
   need, surprising semantic anomalies may arise when types or collations do
-  not match, due to the remote server interpreting <literal>WHERE</literal> clauses
-  slightly differently from the local server.
+  not match, due to the remote server interpreting query conditions
+  differently from the local server.
  </para>

  <para>
@@ -537,6 +537,17 @@ OPTIONS (ADD password_required 'false');
        need to turn this off if the remote server has a different set of
        collation names than the local server does, which is likely to be the
        case if it's running on a different operating system.
+       If you do so, however, there is a very severe risk that the imported
+       table columns' collations will not match the underlying data, resulting
+       in anomalous query behavior.
+      </para>
+
+      <para>
+       Even when this parameter is set to <literal>true</literal>, importing
+       columns whose collation is the remote server's default can be risky.
+       They will be imported with <literal>COLLATE "default"</literal>, which
+       will select the local server's default collation, which could be
+       different.
       </para>
      </listitem>
     </varlistentry>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: decoupling table and index vacuum
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: logical decoding and replication of sequences