Обсуждение: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

Поиск

Список

Период

Сортировка

distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

От

Zhenghua Lyu

Дата:

10 июля 2020 г., 07:23:04

Hi all,

consider the following SQL:

================================================================================================

gpadmin=# explain (verbose, costs off)

select * from t,

(select a from generate_series(1, 1)a)x,

(select a from generate_series(1, 1)a)y

where ((x.a+y.a)/4.0) > random();

QUERY PLAN
----------------------------------------------------------------------------------------
Nested Loop
Output: t.a, t.b, a.a, a_1.a
-> Nested Loop
Output: a.a, a_1.a
Join Filter: (((((a.a + a_1.a))::numeric / 4.0))::double precision > random())
-> Function Scan on pg_catalog.generate_series a
Output: a.a
Function Call: generate_series(1, 1)
-> Function Scan on pg_catalog.generate_series a_1
Output: a_1.a
Function Call: generate_series(1, 1)
-> Seq Scan on public.t
Output: t.a, t.b
(13 rows)

================================================================================================

The where clause is "pushed down to the x,y" because it only references these two relations.

The original query tree's join tree is like:

FromExpr []
[fromlist]
RangeTblRef [rtindex=1]
RangeTblRef [rtindex=4]
RangeTblRef [rtindex=5]
[quals]
OpExpr [opno=674 opfuncid=297 opresulttype=16 opretset=false]
FuncExpr [funcid=1746 funcresulttype=701 funcretset=false funcvariadic=false
funcformat=COERCE_IMPLICIT_CAST]
OpExpr [opno=1761 opfuncid=1727 opresulttype=1700 opretset=false]
FuncExpr [funcid=1740 funcresulttype=1700 funcretset=false funcvariadic=false
funcformat=COERCE_IMPLICIT_CAST]
OpExpr [opno=551 opfuncid=177 opresulttype=23 opretset=false]
Var [varno=4 varattno=1 vartype=23 varnoold=4 varoattno=1]
Var [varno=5 varattno=1 vartype=23 varnoold=5 varoattno=1]
Const [consttype=1700 constlen=-1 constvalue=94908966309104 constisnull=false
constbyval=false]
FuncExpr [funcid=1598 funcresulttype=701 funcretset=false funcvariadic=false
funcformat=COERCE_EXPLICIT_CALL]

It seems the semantics it wants to express is: filter after join all the tables.

Thus maybe a plan like

Nested Loop
Join Filter: (((((a.a + a_1.a))::numeric / 4.0))::double precision > random())
-> Nested Loop
-> Function Scan on generate_series a
-> Function Scan on generate_series a_1
-> Seq Scan on t (cost=0.00..32.60 rows=2260 width=8)

May also be reasonable because it is just the direct translation from the original query tree.

The above plans may have different property:

* the first one, if we push down, can only produce 2 results: 0 rows, or 10 rows. No third possibility

* the second one, will output 0 ~ 10 rows with equal probability.

I am wondering if we should consider volatile functions in restrictinfo when try to distribute_restrictinfo_to_rels?

Best,

Zhenghua Lyu

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

От

Tom Lane

Дата:

10 июля 2020 г., 17:10:46

Zhenghua Lyu <zlyu@vmware.com> writes:
>         The where clause is "pushed down to the x,y" because it only references these two relations.

Yeah.  I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea.  There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated.  (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

            regards, tom lane

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

От

Zhenghua Lyu

Дата:

11 июля 2020 г., 03:32:32

Hi,

Thanks for your reply.

I find the problem in a distributed database based on Postgres (Greenplum). In distributed database

there may be distributed tables:

every single node only contain subpart of the data and combine them all will get the full data

I think it may also be a problem for Postgres's parallel computing.

1. What postgres planner do for parallel scan a table and then join a generate_series() function scan?

2. What postgres planner do for parallel scan a table and then join a generate_series() function scan with a volatile filter?

Thus running the SQL in the above case, since generate_series functions can can be taken as the same every where,

And generate_series join generate_series also have this property: the data is complete in every single node. This property

is very helpful in a distributed join: A distributed table join generate_series function can just join in every local node and then

gather the result back to a single node.

But things are different when there are volatile functions: volatile functions may be in where clause, targetlist and somewhere.

That is why I come up with the above case and ask here.

To be honest, I do not care the push down so much. It is not normal usage to writing volatile functions in where clause.

I just find it lose the property.

Best,

Zhenghua Lyu

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Friday, July 10, 2020 10:10 PM
To: Zhenghua Lyu <zlyu@vmware.com>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

Zhenghua Lyu <zlyu@vmware.com> writes:
> The where clause is "pushed down to the x,y" because it only references these two relations.

Yeah. I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea. There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated. (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions