postgres_fdw how to pushdown parent table's JOIN

Поиск
Список
Период
Сортировка
От 德哥
Тема postgres_fdw how to pushdown parent table's JOIN
Дата
Msg-id d18e862.7bde.15620cc522a.Coremail.digoal@126.com
обсуждение исходный текст
Список pgsql-general

PostgreSQL continued in sharding technology fdw based roots, the 9.6 Subject to the conditions set, and JOIN support SORT pushed down to the data node execution.
Here is a test
Create several shard databases

for subfix in 0 1 2 3
do
psql -c "create database db $ subfix"
done
Create a master database

psql -c "create database master;"
psql master -c "create extension postgres_fdw;"
Create a foreign server and user mapping in the master database

for subfix in 0 1 2 3
do
psql master -c "create server db $ subfix foreign data wrapper postgres_fdw options (hostaddr 'xxx.xxx.xxx.xxx', dbname 'db $ subfix', port '1923');"
psql master -c "create user mapping for postgres server db $ subfix options (user 'postgres', password 'postgres');"
done
Create slice table shard databases

for subfix in 0 1 2 3
do
psql db $ subfix -c "drop table if exists tbl; create table tbl (id int primary key, info text)"
psql db $ subfix -c "drop table if exists tab; create table tab (id int primary key, info text)"
done
Create a foreign table in the master database, and set constraints

for subfix in 0 1 2 3
do
psql master -c "drop foreign table if exists tbl $ subfix; create foreign table tbl $ subfix (id int not null, info text) server db $ subfix options (schema_name 'public', table_name 'tbl');"
psql master -c "alter foreign table tbl $ subfix add constraint ck1 check (mod (id, 4) = $ subfix);"

psql master -c "drop foreign table if exists tab $ subfix; create foreign table tab $ subfix (id int not null, info text) server db $ subfix options (schema_name 'public', table_name 'tab');"
psql master -c "alter foreign table tab $ subfix add constraint ck1 check (mod (id, 4) = $ subfix);"
done
View

psql master << EOF
\ Det
EOF
result

 List of foreign tables
 Schema | Table | Server
-------- + ------- + --------
 public | tab0 | db0
 public | tab1 | db1
 public | tab2 | db2
 public | tab3 | db3
 public | tbl0 | db0
 public | tbl1 | db1
 public | tbl2 | db2
 public | tbl3 | db3
(8 rows)
Create the parent table in the master database

psql master -c "create table tbl (id int primary key, info text);"
psql master -c "create table tab (id int primary key, info text);"
Creating inheritance foreign and parent tables in the master database

for subfix in 0 1 2 3
do
psql master -c "alter foreign table tbl $ subfix inherit tbl;"
psql master -c "alter foreign table tab $ subfix inherit tab;"
done
Test JOIN pushdown

master = # explain verbose select * from tbl1, tab1 where tab1.id = tbl1.id and mod (tbl1.id, 4) = 1;
                                                                     QUERY PLAN
-------------------------------------------------- -------------------------------------------------- ------------------------------------------------
 Foreign Scan (cost = 100.00..226.75 rows = 48 width = 72)
   Output: tbl1.id, tbl1.info, tab1.id, tab1.info
   Relations: (public.tbl1) INNER JOIN (public.tab1)
   Remote SQL: SELECT r1.id, r1.info, r2.id, r2.info FROM (public.tbl r1 INNER JOIN public.tab r2 ON (((r1.id = r2.id)) AND ((mod (r1 .id, 4) = 1))))
(4 rows)
Need to close the current sort pushdown switch enable_sort optimizer will pushdown is worth improvement

master = # set enable_sort = off;
SET
master = # explain verbose select * from tbl1 where mod (id, 4) = mod (100,4) order by id;
                                            QUERY PLAN
-------------------------------------------------- -------------------------------------------------
 Foreign Scan on public.tbl1 (cost = 100.00..136.71 rows = 7 width = 36)
   Output: id, info
   Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod (id, 4) = 0)) ORDER BY id ASC NULLS LAST
(3 rows)


Also in need of improvement
What such a query optimizer to optimize?
1. If you want more pure sharding, the parent table should not be involved in the calculation, it is just an alias. Thus JOIN pushdown can prerequisite.
2. tab.id = tbl.id and mod (tbl.id, 4) = 1 can deduce and mod (tab.id, 4) = 1. Therefore tab table only needs to scan tab1.

master = # explain verbose select * from tbl, tab where tab.id = tbl.id and mod (tbl.id, 4) = 1;
                                           QUERY PLAN
-------------------------------------------------- ----------------------------------------------
 Gather (cost = 0.00..0.00 rows = 0 width = 0)
   Output: tbl.id, tbl.info, tab.id, tab.info
   Workers Planned: 1
   Single Copy: true
   -> Hash Join (cost = 130.71..757.17 rows = 218 width = 72)
         Output: tbl.id, tbl.info, tab.id, tab.info
         Hash Cond: (tab.id = tbl.id)
         -> Append (cost = 0.00..603.80 rows = 5461 width = 36)
               -> Seq Scan on public.tab (cost = 0.00..0.00 rows = 1 width = 36)
                     Output: tab.id, tab.info
               -> Foreign Scan on public.tab0 (cost = 100.00..150.95 rows = 1365 width = 36)
                     Output: tab0.id, tab0.info
                     Remote SQL: SELECT id, info FROM public.tab
               -> Foreign Scan on public.tab1 (cost = 100.00..150.95 rows = 1365 width = 36)
                     Output: tab1.id, tab1.info
                     Remote SQL: SELECT id, info FROM public.tab
               -> Foreign Scan on public.tab2 (cost = 100.00..150.95 rows = 1365 width = 36)
                     Output: tab2.id, tab2.info
                     Remote SQL: SELECT id, info FROM public.tab
               -> Foreign Scan on public.tab3 (cost = 100.00..150.95 rows = 1365 width = 36)
                     Output: tab3.id, tab3.info
                     Remote SQL: SELECT id, info FROM public.tab
         -> Hash (cost = 130.61..130.61 rows = 8 width = 36)
               Output: tbl.id, tbl.info
               -> Append (cost = 0.00..130.61 rows = 8 width = 36)
                     -> Seq Scan on public.tbl (cost = 0.00..0.00 rows = 1 width = 36)
                           Output: tbl.id, tbl.info
                           Filter: (mod (tbl.id, 4) = 1)
                     -> Foreign Scan on public.tbl1 (cost = 100.00..130.61 rows = 7 width = 36)
                           Output: tbl1.id, tbl1.info
                           Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod (id, 4) = 1))
(31 rows)
so i modify the source code, allows to delete the inherited constraints
vi src / backend / commands / tablecmds.c

                / * Do not drop inherited constraints * /
// If (con-> coninhcount> 0 &&! Recursing)
// Ereport (ERROR,
// (Errcode (ERRCODE_INVALID_TABLE_DEFINITION),
// Errmsg ( "can not drop inherited constraint \"% s \ "of relation \"% s \ "",
// ConstrName, RelationGetRelationName (rel))));
Recompile and restart

make && make install
pg_ctl restart -m fast
Add Constraints primary table, delete the child table constraints, resulting in the illusion of the main table is not accessed.

alter table tab add constraint tab_ck check (mod (id, 4) is null);
alter table tbl add constraint tbl_ck check (mod (id, 4) is null);

alter table only tab0 drop constraint tab_ck;
alter table only tab1 drop constraint tab_ck;
alter table only tab2 drop constraint tab_ck;
alter table only tab3 drop constraint tab_ck;

alter table only tbl0 drop constraint tbl_ck;
alter table only tbl1 drop constraint tbl_ck;
alter table only tbl2 drop constraint tbl_ck;
alter table only tbl3 drop constraint tbl_ck;
My goal is to be able to do so pushdown JOIN, but not actually push down, this is very painful.
A real sub-library of middleware should solve this problem.

master = # explain verbose select * from tbl, tab where tab.id = tbl.id and mod (tbl.id, 4) = 1 and mod (tab.id, 4) = 1;
                                        QUERY PLAN
-------------------------------------------------- ----------------------------------------
 Nested Loop (cost = 200.00..261.98 rows = 7 width = 72)
   Output: tbl1.id, tbl1.info, tab1.id, tab1.info
   Join Filter: (tbl1.id = tab1.id)
   -> Append (cost = 100.00..130.61 rows = 7 width = 36)
         -> Foreign Scan on public.tbl1 (cost = 100.00..130.61 rows = 7 width = 36)
               Output: tbl1.id, tbl1.info
               Remote SQL: SELECT id, info FROM public.tbl WHERE ((mod (id, 4) = 1))
   -> Materialize (cost = 100.00..130.65 rows = 7 width = 36)
         Output: tab1.id, tab1.info
         -> Append (cost = 100.00..130.61 rows = 7 width = 36)
               -> Foreign Scan on public.tab1 (cost = 100.00..130.61 rows = 7 width = 36)
                     Output: tab1.id, tab1.info
                     Remote SQL: SELECT id, info FROM public.tab WHERE ((mod (id, 4) = 1))
(13 rows)
This is what I want results

                                                                                  QUERY PLAN
-------------------------
 Foreign Scan (cost = 100.00..161.58 rows = 1 width = 72)
   Output: tbl.id, tbl.info, tab.id, tab.info
   Relations: (public.tbl1 tbl) INNER JOIN (public.tab1 tab)
   Remote SQL: SELECT r1.id, r1.info, r2.id, r2.info FROM (public.tbl r1 INNER JOIN public.tab r2 ON (((r1.id = r2.id)) AND ((mod (r2 .id, 4) = 1)) AND ((mod (r1.id, 4) = 1))))
(4 rows)



--
公益是一辈子的事,I'm Digoal,Just Do It.

В списке pgsql-general по дате отправления:

Предыдущее
От: Sameer Kumar
Дата:
Сообщение: Database and Table stats gets reset automatically
Следующее
От: Peter Devoy
Дата:
Сообщение: Re: Return results of join with polymorphically-defined table in pl/pgsql