Re: [HACKERS] Varying results when using merge joins overpostgres_fdw vs hash joins

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] Varying results when using merge joins overpostgres_fdw vs hash joins
Дата
Msg-id CAFjFpRevO0cocUgA47BV3CCkXdCPZkX5uCV8_BinsLkHWLUNww@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Varying results when using merge joins over postgres_fdw vs hash joins  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: [HACKERS] Varying results when using merge joins overpostgres_fdw vs hash joins  (Corey Huinker <corey.huinker@gmail.com>)
Re: [HACKERS] Varying results when using merge joins overpostgres_fdw vs hash joins  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, Sep 20, 2017 at 5:07 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
> We are having an issue with a query that will return no results when the
> query does a merge join with a foreign table, but (correctly) returns
> results when using a hash join.
>
> Here is the situation on the "remote" database (9.5):
>
>
> # \d+ table_with_en_us_utf8_encoding
>                    Table "public.table_with_en_us_utf8_encoding"
>  Column |          Type          | Modifiers | Storage  | Stats target |
> Description
> --------+------------------------+-----------+----------+--------------+-------------
>  id     | bigint                 |           | plain    |              |
>  str1   | character varying(255) |           | extended |              |
>  str2   | character varying(255) |           | extended |              |
>  str3   | character varying(255) |           | extended |              |
>  str4   | character varying(3)   |           | extended |              |
>
> analytics=# select encoding, datcollate, datctype from pg_database where
> datname = current_database();
>  encoding | datcollate  |  datctype
> ----------+-------------+-------------
>         6 | en_US.UTF-8 | en_US.UTF-8
>
>
>
>
> And here's what we do on the local side (9.6):
>
> # select encoding, datcollate, datctype from pg_database where datname =
> current_database();
>  encoding | datcollate | datctype
> ----------+------------+----------
>         6 | C          | C
>
> # import foreign schema public limit to (table_with_en_us_utf8_encoding)
> from server primary_replica into public;
>
> # \d+ table_with_en_us_utf8_encoding
>                                       Foreign table
> "public.table_with_en_us_utf8_encoding"
>  Column |          Type          | Collation | Nullable | Default |     FDW
> options      | Storage  | Stats target | Description
>
--------+------------------------+-----------+----------+---------+----------------------+----------+--------------+-------------
>  id     | bigint                 |           |          |         |
> (column_name 'id')   | plain    |              |
>  str1   | character varying(255) |           |          |         |
> (column_name 'str1') | extended |              |
>  str2   | character varying(255) |           |          |         |
> (column_name 'str2') | extended |              |
>  str3   | character varying(255) |           |          |         |
> (column_name 'str3') | extended |              |
>  str4   | character varying(3)   |           |          |         |
> (column_name 'str4') | extended |              |
> Server: primary_replica
> FDW options: (schema_name 'public', table_name
> 'table_with_en_us_utf8_encoding')
>

The collation column is empty here, which means that collation for
str* columns is default collation i.e. C. This isn't true, since the
default ncollation on the foreign server is different from the default
collation of local database. AFAIU, import foreign schema should have
set appropriate collation of the foreign table.

> # create temporary table tmp_on_c_collated_foreign_server (str2 text);
>
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
>
> --
> -- query with merge join, returns zero rows
> --
> # explain (analyze, verbose) select e.str1, e.str2, e.str3 from
> tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding
> e  on c.str2 = e.str2 where e.str4='2' ;
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=18041.88..22322.92 rows=229221 width=1548) (actual
> time=102.849..102.849 rows=0 loops=1)
>    Output: e.str1, e.str2, e.str3
>    Merge Cond: ((e.str2)::text = c.str2)
>    ->  Foreign Scan on public.table_with_en_us_utf8_encoding e
> (cost=17947.50..18705.95 rows=33709 width=93) (actual time=102.815..102.815
> rows=1 loops=1)
>          Output: e.id, e.str1, e.str2, e.str3, e.str4
>          Remote SQL: SELECT str1, str2, str3 FROM
> public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text)) ORDER BY
> str2 ASC NULLS LAST
>    ->  Sort  (cost=94.38..97.78 rows=1360 width=32) (actual
> time=0.028..0.029 rows=7 loops=1)
>          Output: c.str2
>          Sort Key: c.str2
>          Sort Method: quicksort  Memory: 25kB
>          ->  Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c
> (cost=0.00..23.60 rows=1360 width=32) (actual time=0.010..0.011 rows=7
> loops=1)
>                Output: c.str2
>  Planning time: 4.285 ms
>  Execution time: 104.458 ms
> (14 rows)
>

Since the results returned by the foreign server are according to the
collation of the foreign server, the order doesn't match with order
expected by the local server and so the merge join reports different
rows.

>
> --
> -- query with hash join, returns rows
>
> --
>
> -- the default for the foreign server is to use remote estimates, so we turn
> that off...
>
> # alter foreign table table_with_en_us_utf8_encoding OPTIONS (ADD
> use_remote_estimate 'false');
> ALTER FOREIGN TABLE
>
> -- and then run the same query again
>
> # explain (analyze, verbose) select e.str1, e.str2, e.str3 from
> tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding
> e  on c.str2 = e.str2 where e.str4='2' ;
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=110.68..139.45 rows=7 width=1548) (actual
> time=154.280..154.286 rows=7 loops=1)
>    Output: e.str1, e.str2, e.str3
>    Hash Cond: (c.str2 = (e.str2)::text)
>    ->  Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c
> (cost=0.00..23.60 rows=1360 width=32) (actual time=0.006..0.008 rows=7
> loops=1)
>          Output: c.str2
>    ->  Hash  (cost=110.67..110.67 rows=1 width=1548) (actual
> time=154.264..154.264 rows=33418 loops=1)
>          Output: e.str1, e.str2, e.str3
>          Buckets: 65536 (originally 1024)  Batches: 1 (originally 1)  Memory
> Usage: 4003kB
>          ->  Foreign Scan on public.table_with_en_us_utf8_encoding e
> (cost=100.00..110.67 rows=1 width=1548) (actual time=8.289..144.210
> rows=33418 loops=1)
>                Output: e.str1, e.str2, e.str3
>                Remote SQL: SELECT str1, str2, str3 FROM
> public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text))
>  Planning time: 0.153 ms
>  Execution time: 156.557 ms
> (13 rows)
>
>

In this case, both tables use same collation while comparing the rows,
so result is different from the merge join result. Hash join executed
on local server and the same executed on foreign server (by importing
local table to the foreign server) would also differ.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Error: dsa_area could not attach to a segment that hasbeen freed
Следующее
От: Rushabh Lathia
Дата:
Сообщение: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)