Обсуждение: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

Поиск
Список
Период
Сортировка

Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
Hi,

Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without
if-not-exists clause, the existence of the relation gets checked
during the execution of the select part and an error is thrown there.
All the unnecessary rewrite and planning for the select part would
have happened just to fail later. However, if if-not-exists clause is
present, then a notice is issued and returned immediately without any
further rewrite or planning for . This seems somewhat inconsistent to
me.

I propose to check the relation existence early in ExecCreateTableAs()
as well as in ExplainOneUtility() and throw an error in case it exists
already to avoid unnecessary rewrite, planning and execution of the
select part.

Attaching a patch. Note that I have not added any test cases as the
existing test cases in create_table.sql and matview.sql would cover
the code.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
"Hou, Zhijie"
Дата:
> Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists
> clause, the existence of the relation gets checked during the execution
> of the select part and an error is thrown there.
> All the unnecessary rewrite and planning for the select part would have
> happened just to fail later. However, if if-not-exists clause is present,
> then a notice is issued and returned immediately without any further rewrite
> or planning for . This seems somewhat inconsistent to me.
> 
> I propose to check the relation existence early in ExecCreateTableAs() as
> well as in ExplainOneUtility() and throw an error in case it exists already
> to avoid unnecessary rewrite, planning and execution of the select part.
> 
> Attaching a patch. Note that I have not added any test cases as the existing
> test cases in create_table.sql and matview.sql would cover the code.
> 
> Thoughts?

Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error
before any further operation too.

I am just a little worried about the behavior change of [explain CTAS].
May be someone will complain the change from normal explaininfo to error output.

And I took a look into the patch.

+        StringInfoData emsg;
+
+        initStringInfo(&emsg);
+
+        if (level == NOTICE)
+            appendStringInfo(&emsg,

Using variable emsg and level seems a little complicated to me.
How about just:

if (!is_explain && ctas->if_not_exists)
ereport(NOTICE,xxx
else
ereport(ERROR,xxx

Best regards,
houzj




Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
Thanks for taking a look at this.

On Fri, Dec 11, 2020 at 6:33 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists
> > clause, the existence of the relation gets checked during the execution
> > of the select part and an error is thrown there.
> > All the unnecessary rewrite and planning for the select part would have
> > happened just to fail later. However, if if-not-exists clause is present,
> > then a notice is issued and returned immediately without any further rewrite
> > or planning for . This seems somewhat inconsistent to me.
> >
> > I propose to check the relation existence early in ExecCreateTableAs() as
> > well as in ExplainOneUtility() and throw an error in case it exists already
> > to avoid unnecessary rewrite, planning and execution of the select part.
> >
> > Attaching a patch. Note that I have not added any test cases as the existing
> > test cases in create_table.sql and matview.sql would cover the code.
> >
> > Thoughts?
>
> Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error
> before any further operation too.
>
> I am just a little worried about the behavior change of [explain CTAS].
> May be someone will complain the change from normal explaininfo to error output.

I think we are clear with the patch for plain i.e. non-EXPLAIN and
EXPLAIN ANALYZE CTAS/CMV cases.

The behaviour for EXPLAIN is as follows:

1)EXPLAIN without ANALYZE, without patch: select part is planned(note
that the relations in the select part are checked for their existence
while planning, fails any of them don't exist) , relation(CTAS/CMV
being created) existence is not checked as we will not create the
relation and execute the plan.

2)EXPLAIN with ANALYZE, without patch: select part is planned, as we
execute the plan, relation(CTAS/CMV) existence is checked during the
execution and fails there if it exists.

3) EXPLAIN without ANALYZE, with patch: relation(CTAS/CMV) existence
is checked before the planning and fails if it exists, without going
further to the planning for select part.

4)EXPLAIN with ANALYZE, with patch: relation(CTAS/CMV) existence is
checked before the rewrite, planning and fails if it exists, without
going further.

IMO, let's not change the 1) behaviour to 3) with the patch. If
agreed, I can do the following way in ExplainOneUtility and will add a
comment on why we are doing this.

if (es->analyze)
 (void) CheckRelExistenceInCTAS(ctas, true);

Thoughts?

> And I took a look into the patch.
>
> +               StringInfoData emsg;
> +
> +               initStringInfo(&emsg);
> +
> +               if (level == NOTICE)
> +                       appendStringInfo(&emsg,
>
> Using variable emsg and level seems a little complicated to me.
> How about just:
>
> if (!is_explain && ctas->if_not_exists)
> ereport(NOTICE,xxx
> else
> ereport(ERROR,xxx

I will modify it in the next version of the patch which I plan to send
once agreed on the above point.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
"Hou, Zhijie"
Дата:

> IMO, let's not change the 1) behaviour to 3) with the patch. If agreed,

> I can do the following way in ExplainOneUtility and will add a comment on

> why we are doing this.

>

> if (es->analyze)

>  (void) CheckRelExistenceInCTAS(ctas, true);

>

> Thoughts?

 

Agreed.

 

Just in case, I took a look at Oracle 12s behavior about [explain CTAS].

Oracle 12 will output the plan without throwing any msg in this case.

 

Best regards,

houzj

 

 

 

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Fri, Dec 11, 2020 at 12:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > IMO, let's not change the 1) behaviour to 3) with the patch. If agreed,
>
> > I can do the following way in ExplainOneUtility and will add a comment on
>
> > why we are doing this.
>
> > if (es->analyze)
>
> >  (void) CheckRelExistenceInCTAS(ctas, true);
>
> > Thoughts?
>
> Agreed.

Thanks!

So, I will post an updated patch soon.

> Just in case, I took a look at Oracle 12’s behavior about [explain CTAS].
>
> Oracle 12 will output the plan without throwing any msg in this case.

I'm not quite sure how other databases behave. If I go by the main
intention of EXPLAIN without ANALYZE, that should do the planning,
show it in the output and no execution of the query should happen. For
EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and
no execution happens so no existence check for the CTAS/CMV relation
that will get created if the CTAS/CMV is executed. Having said that,
the existence of the relations that are in the SELECT part are anyways
checked during planning for EXPLAIN without ANALYZE.

IMHO, let's not alter the existing behaviour, if needed, that can be
discussed separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote:
> I'm not quite sure how other databases behave. If I go by the main
> intention of EXPLAIN without ANALYZE, that should do the planning,
> show it in the output and no execution of the query should happen. For
> EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and
> no execution happens so no existence check for the CTAS/CMV relation
> that will get created if the CTAS/CMV is executed. Having said that,
> the existence of the relations that are in the SELECT part are anyways
> checked during planning for EXPLAIN without ANALYZE.

I think that it is tricky to define IF NOT EXISTS for a CTAS with
EXPLAIN.  How would you for example treat an EXPLAIN ANALYZE with a
query that includes an INSERT RETURNING in a WITH clause.  Would you
say that we do nothing if the relation exists?  Or would you execute
it, still insert nothing on the result relation because it already
exists, even if the inner query may have inserted something as part of
its execution on a different relation?
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Fri, Dec 11, 2020 at 1:40 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote:
> > I'm not quite sure how other databases behave. If I go by the main
> > intention of EXPLAIN without ANALYZE, that should do the planning,
> > show it in the output and no execution of the query should happen. For
> > EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and
> > no execution happens so no existence check for the CTAS/CMV relation
> > that will get created if the CTAS/CMV is executed. Having said that,
> > the existence of the relations that are in the SELECT part are anyways
> > checked during planning for EXPLAIN without ANALYZE.
>
> I think that it is tricky to define IF NOT EXISTS for a CTAS with
> EXPLAIN. How would you for example treat an EXPLAIN ANALYZE with a
> query that includes an INSERT RETURNING in a WITH clause.  Would you
> say that we do nothing if the relation exists?  Or would you execute
> it, still insert nothing on the result relation because it already
> exists, even if the inner query may have inserted something as part of
> its execution on a different relation?

I may not have got your above scenario correctly(it will be good if
you can provide the use case in case I want to check something there).
I tried the following way, all the involved relations are being
checked for existence even though for EXPLAIN:
postgres=#   EXPLAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO
t1_does_not_exit VALUES (1);
ERROR:  relation "t1_does_not_exit" does not exist
LINE 1: ...LAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO t1_does_no...
                                                             ^
IIUC, is it that we want the following behaviour in case the relation
CTAS/CMV is trying to create does not exist? Note that the sample
queries are run on latest master branch:

EXPLAIN: throw an error, instead of the query showing select plan on
master branch currently?
postgres=# explain create table t2 as select * from t1;
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on t1  (cost=0.00..2.00 rows=100 width=8)

EXPLAIN ANALYZE: throw an error as it does on master branch?
postgres=# explain analyze create table t2 as select * from t1;
ERROR:  relation "t2" already exists

EXPLAIN with if-not-exists clause: throw a warning and an empty plan
from ExplainOneUtility? If not an empty plan, we should be doing the
relation existence check before we come to explain routines, maybe in
gram.c? On the master branch it doesn't happen, the query shows the
plan for select part as shown below.
postgres=# explain create table if not exists t2 as select * from t1;
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on t1  (cost=0.00..2.00 rows=100 width=8)

EXPLAIN ANALYZE with if-not-exists clause: (ideally, for if-not-exists
clause we expect a warning to be issued, but currently relation
existence error is thrown) a warning and an empty plan from
ExplainOneUtility? If not an empty plan, we should be doing the
relation existence check before we come to explain routines, maybe in
gram.c? On the master branch an ERROR is thrown.
postgres=# explain analyze create table if not exists t2 as select * from t1;
ERROR:  relation "t2" already exists

For plain CTAS -> throw an error as it happens on master branch.
postgres=# create table t2 as select * from t1;
ERROR:  relation "t2" already exists

For plain CTAS with if-not-exists clause -> a warning is issued as it
happens on master branch.
postgres=# create table if not exists t2 as select * from t1;
NOTICE:  relation "t2" already exists, skipping
CREATE TABLE AS

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Fri, Dec 11, 2020 at 03:03:46PM +0530, Bharath Rupireddy wrote:
> I may not have got your above scenario correctly(it will be good if
> you can provide the use case in case I want to check something there).

It is possible to have DML queries in WITH clauses, as long as they
use RETURNING to feed tuples to the outer query.  Just imagine
something like that:
=# explain analyze
     create table if not exists aa as
       with insert_query as
         (insert into aa values (1) returning a)
       select * from insert_query;

Please note that this case fails with your patch, but the presence of
IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
instead, no?   Taking this case specifically (OK, I am playing with
the rules a bit to insert data into the relation itself, still), this
query may finish by adding tuples to the table whose creation should
have been bypassed but the query got executed and inserted tuples.
That's one example of behavior that may be confusing.  There may be
others, but it seems to me that it may be simpler to execute or even
plan the query at all if the relation already exists.
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote:
> Please note that this case fails with your patch, but the presence of
> IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
> instead, no?   Taking this case specifically (OK, I am playing with
> the rules a bit to insert data into the relation itself, still), this
> query may finish by adding tuples to the table whose creation should
> have been bypassed but the query got executed and inserted tuples.
> That's one example of behavior that may be confusing.  There may be
> others, but it seems to me that it may be simpler to execute or even
> plan the query at all if the relation already exists.

Er..  Sorry. I meant here to *not* execute or even *not* plan the
query at all if the relation already exists.
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote:
> > Please note that this case fails with your patch, but the presence of
> > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
> > instead, no?

Thanks for the use case. The provided use case (or for that matter any
use case with explain analyze ctas if-not-exists) fails if the
relation already exists. It happens on the master branch, please have
a look at tests [1]. You are right in saying that whether it is
explain/explain analyze ctas if there is if-not-exists we should issue
notice instead of error as with plain ctas.

Do we want to fix this behaviour for explain/explain analyze ctat with
if-not-exists cases? Thoughts?

If yes, we could change the code in ExplainOneUtility() such that we
check relation existence before rewrite/planning, issue notice and
return. Then. the user sees a notice and an empty plan as we are
returning from ExplainOneUtility(). Is it okay to show a warning and
an empty plan to the user? Thoughts?

>> Taking this case specifically (OK, I am playing with
> > the rules a bit to insert data into the relation itself, still), this
> > query may finish by adding tuples to the table whose creation should
> > have been bypassed but the query got executed and inserted tuples.

IIUC, with the use case provided, the tuples will not be inserted as
the query later fails (and the txn gets aborted) if the relation
exists.

> > That's one example of behavior that may be confusing.  There may be
> > others, but it seems to me that it may be simpler to execute or even
> > plan the query at all if the relation already exists.
>
> Er..  Sorry. I meant here to *not* execute or even *not* plan the
> query at all if the relation already exists.

+1 to not plan and execute the query at all if the relation which
ctas/cmv is trying to create already exists.

[1] -
postgres=#   explain analyze
postgres-#      create table if not exists aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
ERROR:  relation "aa" already exists

postgres=#  explain analyze
postgres-#      create table aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
ERROR:  relation "aa" already exists

postgres=#  explain
postgres-#      create table aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
                         QUERY PLAN
------------------------------------------------------------
 CTE Scan on insert_query  (cost=0.01..0.03 rows=1 width=4)
   CTE insert_query
     ->  Insert on aa  (cost=0.00..0.01 rows=1 width=4)
           ->  Result  (cost=0.00..0.01 rows=1 width=4)

postgres=#   explain
postgres-#       create table if not exists aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
                         QUERY PLAN
------------------------------------------------------------
 CTE Scan on insert_query  (cost=0.01..0.03 rows=1 width=4)
   CTE insert_query
     ->  Insert on aa  (cost=0.00..0.01 rows=1 width=4)
           ->  Result  (cost=0.00..0.01 rows=1 width=4)

postgres=#   create table aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
ERROR:  relation "aa" already exists

postgres=#   create table if not exists aa as
postgres-#        with insert_query as
postgres-#          (insert into aa values (1) returning a1)
postgres-#        select * from insert_query;
NOTICE:  relation "aa" already exists, skipping
CREATE TABLE AS

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Mon, Dec 14, 2020 at 1:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote:
> > > Please note that this case fails with your patch, but the presence of
> > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE
> > > instead, no?
>
> Thanks for the use case. The provided use case (or for that matter any
> use case with explain analyze ctas if-not-exists) fails if the
> relation already exists. It happens on the master branch, please have
> a look at tests [1]. You are right in saying that whether it is
> explain/explain analyze ctas if there is if-not-exists we should issue
> notice instead of error as with plain ctas.
>
> Do we want to fix this behaviour for explain/explain analyze ctat with
> if-not-exists cases? Thoughts?
>
> If yes, we could change the code in ExplainOneUtility() such that we
> check relation existence before rewrite/planning, issue notice and
> return. Then. the user sees a notice and an empty plan as we are
> returning from ExplainOneUtility(). Is it okay to show a warning and
> an empty plan to the user? Thoughts?
>
> >> Taking this case specifically (OK, I am playing with
> > > the rules a bit to insert data into the relation itself, still), this
> > > query may finish by adding tuples to the table whose creation should
> > > have been bypassed but the query got executed and inserted tuples.
>
> IIUC, with the use case provided, the tuples will not be inserted as
> the query later fails (and the txn gets aborted) if the relation
> exists.
>
> > > That's one example of behavior that may be confusing.  There may be
> > > others, but it seems to me that it may be simpler to execute or even
> > > plan the query at all if the relation already exists.
> >
> > Er..  Sorry. I meant here to *not* execute or even *not* plan the
> > query at all if the relation already exists.
>
> +1 to not plan and execute the query at all if the relation which
> ctas/cmv is trying to create already exists.

Posting a v2 patch after modifying the new function CheckRelExistenceInCTAS() a bit as suggested earlier.

The behavior of the ctas/cmv, in case the relation already exists is as shown in [1]. The things that have been changed with the patch are: 1) In any case we do not rewrite or plan the select part if the relation already exists 2) For explain ctas/cmv (without analyze), now the relation existence is checked early and the error is thrown as highlighted in [1].

With patch, there is no behavioral change(from that of master branch) in explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the notice.

Thoughts?

[1]
With patch:
postgres=# create table foo as select 1;
ERROR:  relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE:  relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR:  relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR:  relation "foo" already exists
postgres=# explain create table foo as select 1;
ERROR:  relation "foo" already exists
postgres=# explain create table if not exists foo as select 1;
ERROR:  relation "foo" already exists


On master/without patch:
postgres=# create table foo as select 1;
ERROR:  relation "foo" already exists
postgres=# create table if not exists foo as select 1;
NOTICE:  relation "foo" already exists, skipping
CREATE TABLE AS
postgres=# explain analyze create table foo as select 1;
ERROR:  relation "foo" already exists
postgres=# explain analyze create table if not exists foo as select 1;
ERROR:  relation "foo" already exists
postgres=# explain create table foo as select 1;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
(1 row)
postgres=# explain create table if not exists foo as select 1;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
(1 row)


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote:
> The behavior of the ctas/cmv, in case the relation already exists is as
> shown in [1]. The things that have been changed with the patch are: 1) In
> any case we do not rewrite or plan the select part if the relation already
> exists 2) For explain ctas/cmv (without analyze), now the relation
> existence is checked early and the error is thrown as highlighted in [1].
>
> With patch, there is no behavioral change(from that of master branch) in
> explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the
> notice.
>
> Thoughts?

HEAD is already a mixed bad of behaviors, and the set of results you
are presenting here is giving a similar impression.  It brings in some
sanity by just ignoring the effects of the IF NOT EXISTS clause all
the time still that's not consistent with the queries not using
EXPLAIN.  Hmm.  Looking for similar behaviors, I can see one case in
select_into.sql where we just never execute the plan when using WITH
NO DATA but still show the plan, meaning that the query gets planned
but it just gets marked as "(never executed)" if attempting to use
ANALYZE.  There may be use cases for that as the user directly asked
directly for an EXPLAIN.

Note: the patch needs tests for all the patterns you would like to
stress.  This way it is easier to follow the patterns that are
changing with your patch and compare them with the HEAD behavior (like
looking at the diffs with the tests of the patch, but without the
diffs in src/backend/).
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote:
> > The behavior of the ctas/cmv, in case the relation already exists is as
> > shown in [1]. The things that have been changed with the patch are: 1) In
> > any case we do not rewrite or plan the select part if the relation already
> > exists 2) For explain ctas/cmv (without analyze), now the relation
> > existence is checked early and the error is thrown as highlighted in [1].
> >
> > With patch, there is no behavioral change(from that of master branch) in
> > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the
> > notice.
> >
> > Thoughts?
>
> HEAD is already a mixed bad of behaviors, and the set of results you
> are presenting here is giving a similar impression.  It brings in some
> sanity by just ignoring the effects of the IF NOT EXISTS clause all
> the time still that's not consistent with the queries not using
> EXPLAIN.

I tried to make it consistent by issuing NOTICE (not an error) even
for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and
exit from the ExplainOneUtility, we could output an empty plan to the
user because, by now ExplainResultDesc would have been called at the
start of the explain via PortalStart(). I didn't find a clean way of
coding if we are not okay to show notice and empty plan to the user.

Any suggestions on achieving above?

> Hmm.  Looking for similar behaviors, I can see one case in
> select_into.sql where we just never execute the plan when using WITH
> NO DATA but still show the plan, meaning that the query gets planned
> but it just gets marked as "(never executed)" if attempting to use
> ANALYZE.

Yes, with no data we would see "(never executed)" for explain analyze
if the relation does not already exist. If the relation does exist,
then the error/notice.

>There may be use cases for that as the user directly asked directly for an EXPLAIN.

IMHO, in any case checking for the existence of the relations
specified in a query is must before we output something to the user.
For instance, the query "explain select * from non_existent_tbl;"
where non_existent_tbl doesn't exist, throws an error. Similarly,
"explain create table already_existing_tbl as select * from
another_tbl;" where the table ctas/select into trying to create
already exists, should also throw error. But that's not happening
currently on master. Which seems to be a problem to me. So, with the
patch proposed here, we error out in this case.

If the user really wants to see the explain plan, then he/she should
use the correct query.

> Note: the patch needs tests for all the patterns you would like to
> stress.  This way it is easier to follow the patterns that are
> changing with your patch and compare them with the HEAD behavior (like
> looking at the diffs with the tests of the patch, but without the
> diffs in src/backend/).

Sure, I will add test cases and post v3 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote:
> > > The behavior of the ctas/cmv, in case the relation already exists is as
> > > shown in [1]. The things that have been changed with the patch are: 1) In
> > > any case we do not rewrite or plan the select part if the relation already
> > > exists 2) For explain ctas/cmv (without analyze), now the relation
> > > existence is checked early and the error is thrown as highlighted in [1].
> > >
> > > With patch, there is no behavioral change(from that of master branch) in
> > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the
> > > notice.
> > >
> > > Thoughts?
> >
> > HEAD is already a mixed bad of behaviors, and the set of results you
> > are presenting here is giving a similar impression.  It brings in some
> > sanity by just ignoring the effects of the IF NOT EXISTS clause all
> > the time still that's not consistent with the queries not using
> > EXPLAIN.
>
> I tried to make it consistent by issuing NOTICE (not an error) even
> for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and
> exit from the ExplainOneUtility, we could output an empty plan to the
> user because, by now ExplainResultDesc would have been called at the
> start of the explain via PortalStart(). I didn't find a clean way of
> coding if we are not okay to show notice and empty plan to the user.
>
> Any suggestions on achieving above?
>
> > Hmm.  Looking for similar behaviors, I can see one case in
> > select_into.sql where we just never execute the plan when using WITH
> > NO DATA but still show the plan, meaning that the query gets planned
> > but it just gets marked as "(never executed)" if attempting to use
> > ANALYZE.
>
> Yes, with no data we would see "(never executed)" for explain analyze
> if the relation does not already exist. If the relation does exist,
> then the error/notice.
>
> >There may be use cases for that as the user directly asked directly for an EXPLAIN.
>
> IMHO, in any case checking for the existence of the relations
> specified in a query is must before we output something to the user.
> For instance, the query "explain select * from non_existent_tbl;"
> where non_existent_tbl doesn't exist, throws an error. Similarly,
> "explain create table already_existing_tbl as select * from
> another_tbl;" where the table ctas/select into trying to create
> already exists, should also throw error. But that's not happening
> currently on master. Which seems to be a problem to me. So, with the
> patch proposed here, we error out in this case.
>
> If the user really wants to see the explain plan, then he/she should
> use the correct query.
>
> > Note: the patch needs tests for all the patterns you would like to
> > stress.  This way it is easier to follow the patterns that are
> > changing with your patch and compare them with the HEAD behavior (like
> > looking at the diffs with the tests of the patch, but without the
> > diffs in src/backend/).
>
> Sure, I will add test cases and post v3 patch.

Attaching v3 patch that also contains test cases. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Mon, Dec 21, 2020 at 12:01:38PM +0530, Bharath Rupireddy wrote:
> On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy
>> I tried to make it consistent by issuing NOTICE (not an error) even
>> for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and
>> exit from the ExplainOneUtility, we could output an empty plan to the
>> user because, by now ExplainResultDesc would have been called at the
>> start of the explain via PortalStart(). I didn't find a clean way of
>> coding if we are not okay to show notice and empty plan to the user.
>>
>> Any suggestions on achieving above?

I was looking at your patch today, and I actually found the conclusion
to output an empty plan while issuing a NOTICE to be quite intuitive
if the caller uses IF NOT EXISTS with EXPLAIN.

> Attaching v3 patch that also contains test cases. Please review it further.

Thanks for adding some test cases!  Some of them were exact
duplicates, so it is possible to reduce the number of queries without
impacting the coverage.  I have also chosen a query that forces an
error within the planner.

Please see the attached.  IF NOT EXISTS implies that CTAS or CREATE
MATVIEW will never ERROR if the relation already exists, with or
without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent
behavior across all the patterns.

Note: I'd like to think that we could choose a better name for
CheckRelExistenceInCTAS().
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote:
> I was looking at your patch today, and I actually found the conclusion
> to output an empty plan while issuing a NOTICE to be quite intuitive
> if the caller uses IF NOT EXISTS with EXPLAIN.

Thanks!

> Thanks for adding some test cases!  Some of them were exact
> duplicates, so it is possible to reduce the number of queries without
> impacting the coverage.  I have also chosen a query that forces an
> error within the planner.
> Please see the attached.  IF NOT EXISTS implies that CTAS or CREATE
> MATVIEW will never ERROR if the relation already exists, with or
> without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent
> behavior across all the patterns.

LGTM.

> Note: I'd like to think that we could choose a better name for
> CheckRelExistenceInCTAS().

I changed it to IsCTASRelCreationAllowed() and attached a v5 patch.
Please let me know if this is okay.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote:
> On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Note: I'd like to think that we could choose a better name for
>> CheckRelExistenceInCTAS().
>
> I changed it to IsCTASRelCreationAllowed() and attached a v5 patch.
> Please let me know if this is okay.

After thinking about that, using "CTAS" while other routines in the
same area use "CreateTableAs" looks inconsistent to me.  So I have
come up with CreateTableAsRelExists() as name.

As the same time, I have looked at the git history to note 9bd27b7
where we had better not give an empty output for non-text formats.  So
I'd like to think that it makes sense to use ExplainDummyGroup() if
the relation exists with IF NOT EXISTS, keeping some consistency.

What do you think?
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Wed, Dec 23, 2020 at 6:01 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote:
> > On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier <michael@paquier.xyz> wrote:
> >> Note: I'd like to think that we could choose a better name for
> >> CheckRelExistenceInCTAS().
> >
> > I changed it to IsCTASRelCreationAllowed() and attached a v5 patch.
> > Please let me know if this is okay.
>
> After thinking about that, using "CTAS" while other routines in the
> same area use "CreateTableAs" looks inconsistent to me.  So I have
> come up with CreateTableAsRelExists() as name.

I think CreateTableAsRelExists() can return true if the relation
already exists and false otherwise, to keep in sync with the function
name. I updated this and attached v7 patch.

> As the same time, I have looked at the git history to note 9bd27b7
> where we had better not give an empty output for non-text formats.  So
> I'd like to think that it makes sense to use ExplainDummyGroup() if
> the relation exists with IF NOT EXISTS, keeping some consistency.
>
> What do you think?

+1. Shall we add some test cases(with xml, yaml, json formats as is
currently being done in explain.sql) to cover that? We can have the
explain_filter() function to remove the unstable parts in the output,
it looks something like below. If yes, please let me know I can add
them to matview and select_into.

postgres=#  select explain_filter('explain(analyze, format xml) create
table if not exists t1 as select 1;');
NOTICE:  relation "t1" already exists, skipping
                    explain_filter
-------------------------------------------------------
 <explain xmlns="http://www.postgresql.org/N/explain">+
   <CREATE-TABLE-AS />                                +
 </explain>
(1 row)

postgres=#  select explain_filter('explain(analyze, format yaml)
create table if not exists t1 as select 1;');
NOTICE:  relation "t1" already exists, skipping
   explain_filter
---------------------
 - "CREATE TABLE AS"
(1 row)

postgres=#  select explain_filter('explain(analyze, format json)
create table if not exists t1 as select 1;');
NOTICE:  relation "t1" already exists, skipping
   explain_filter
---------------------
 [                  +
   "CREATE TABLE AS"+
 ]
(1 row)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote:
> +1. Shall we add some test cases(with xml, yaml, json formats as is
> currently being done in explain.sql) to cover that? We can have the
> explain_filter() function to remove the unstable parts in the output,
> it looks something like below. If yes, please let me know I can add
> them to matview and select_into.

I am not sure that we need tests for all the formats, but having at
least one of them sounds good to me.  I leave the choice up to you.

What we have here looks rather committable.  Let's wait until the
period of vacations is over before wrapping this up to give others the
occasion to comment.
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Bharath Rupireddy
Дата:
On Thu, Dec 24, 2020 at 7:40 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote:
> > +1. Shall we add some test cases(with xml, yaml, json formats as is
> > currently being done in explain.sql) to cover that? We can have the
> > explain_filter() function to remove the unstable parts in the output,
> > it looks something like below. If yes, please let me know I can add
> > them to matview and select_into.
>
> I am not sure that we need tests for all the formats, but having at
> least one of them sounds good to me.  I leave the choice up to you.

Since I tested that with all the formats manually here and it works,
so I don't want to make the test cases complicated with adding
explain_filter() function into matview.sql and select_into.sql and all
that. I'm okay without those test cases.

> What we have here looks rather committable.  Let's wait until the
> period of vacations is over before wrapping this up to give others the
> occasion to comment.

Thanks! Happy Vacations!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Thu, Dec 24, 2020 at 09:10:22AM +0530, Bharath Rupireddy wrote:
> Since I tested that with all the formats manually here and it works,
> so I don't want to make the test cases complicated with adding
> explain_filter() function into matview.sql and select_into.sql and all
> that. I'm okay without those test cases.

Please note that I have added an entry in the CF app for the moment so
as we don't lose track of it:
https://commitfest.postgresql.org/31/2892/

>> What we have here looks rather committable.  Let's wait until the
>> period of vacations is over before wrapping this up to give others the
>> occasion to comment.
>
> Thanks! Happy Vacations!

You too!
--
Michael

Вложения

Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs

От
Michael Paquier
Дата:
On Thu, Dec 24, 2020 at 01:23:40PM +0900, Michael Paquier wrote:
> Please note that I have added an entry in the CF app for the moment so
> as we don't lose track of it:
> https://commitfest.postgresql.org/31/2892/

I have been able to look at that again today, and applied it.  I have
tweaked a bit the comments, and added an elog(ERROR) as a safety net
for explain.c if the IFNE code path is taken for an object type that
is not expected with CreateTableAsStmt.
--
Michael

Вложения