Re: [bug?] Missed parallel safety checks, and wrong parallel safety

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Дата
Msg-id CAA4eK1KAcBVJsiWd1JODdyv1aboRUCv=sMTWZu4H2E_ap7hrZw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [bug?] Missed parallel safety checks, and wrong parallel safety  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [bug?] Missed parallel safety checks, and wrong parallel safety  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Jun 7, 2021 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jun 4, 2021 at 6:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Thoughts?
>
> As far as I can see, trying to error out at function call time if the
> function is parallel-safe doesn't fix any problem we have, and just
> makes the design of this part of the system less consistent with what
> we've done elsewhere. For example, if you create a stable function
> that internally calls a volatile function, you don't get an error. You
> can use your stable function in an index definition if you wish. That
> may break, but if so, that's your problem. Also, when it breaks, it
> probably won't blow up the entire world; you'll just have a messed-up
> index. Currently, the parallel-safety stuff works the same way. If we
> notice that something is marked parallel-unsafe, we'll skip
> parallelism.
>

This is not true in all cases which is one of the reasons for this
thread. For example, we don't skip parallelism when I/O functions are
parallel-unsafe as is shown in the following case:

postgres=# CREATE FUNCTION text_w_default_in(cstring)   RETURNS
text_w_default   AS 'textin'   LANGUAGE internal STRICT IMMUTABLE;
NOTICE:  type "text_w_default" is not yet defined
DETAIL:  Creating a shell type definition.
CREATE FUNCTION

postgres=# CREATE FUNCTION text_w_default_out(text_w_default)
RETURNS cstring   AS 'textout'   LANGUAGE internal STRICT IMMUTABLE;
NOTICE:  argument type text_w_default is only a shell
CREATE FUNCTION
postgres=# CREATE TYPE text_w_default (   internallength = variable,
input = text_w_default_in,   output = text_w_default_out,   alignment
= int4,   default = 'zippo');
CREATE TYPE
postgres=# CREATE TABLE default_test (f1 text_w_default, f2 int);
CREATE TABLE
postgres=# INSERT INTO default_test DEFAULT VALUES;
INSERT 0 1
postgres=# SELECT * FROM default_test;
ERROR:  parallel-safety execution violation of function "text_w_default_out" (u)

Note the error is raised after applying the patch, without the patch,
the above won't show any error (error message could be improved here).
Such cases can lead to unpredictable behavior without a patch because
we won't be able to detect the execution of parallel-unsafe functions.
There are similar examples from regression tests. Now, one way to deal
with similar cases could be that we document them and say we don't
consider parallel-safety in such cases and the other way is to detect
such cases and error out. Yet another way could be that we somehow try
to check these cases as well before enabling parallelism but I thought
these cases fall in the similar category as aggregate's support
functions.

> But you can lie to us and claim that things are safe when
> they're not, and if you do, it may break, but that's your problem.
> Mostly likely your query will just error out, and there will be no
> worse consequences than that, though if your parallel-unsafe function
> is written in C, it could do horrible things like crash, which is
> unavoidable because C code can do anything.
>

That is true but I was worried for cases where users didn't lie to us
but we still allowed those to choose parallelism.

> Now, the reason for all of this work, as I understand it, is because
> we want to enable parallel inserts, and the problem there is that a
> parallel insert could involve a lot of different things: it might need
> to compute expressions, or fire triggers, or check constraints, and
> any of those things could be parallel-unsafe. If we enable parallelism
> and then find out that we need to do to one of those things, we have a
> problem. Something probably will error out. The thing is, with this
> proposal, that issue is not solved. Something will definitely error
> out. You'll probably get the error in a different place, but nobody
> fires off an INSERT hoping to get one error message rather than
> another. What they want is for it to work. So I'm kind of confused how
> we ended up going in this direction which seems to me at least to be a
> tangent from the real issue, and somewhat at odds with the way the
> rest of PostgreSQL is designed.
>
> It seems to me that we could simply add a flag to each relation saying
> whether or not we think that INSERT operations - or perhaps DML
> operations generally - are believed to be parallel-safe for that
> relation.
>

This is exactly the direction we are trying to pursue. The proposal
[1] has semantics like:
CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE };
    ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u',
'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.
This might require some bike-shedding to decide how exactly we want to
expose it to the user but I think it is on the lines of what you have
described here.

> Like the marking on functions, it would be the user's
> responsibility to get that marking correct. If they don't, they might
> call a parallel-unsafe function in parallel mode, and that will
> probably error out. But that's no worse than what we already have in
> existing cases, so I don't see why it requires doing what's proposed
> here first.
>

I agree it is not necessarily required if we give the responsibility
to the user but this might give a better user experience, OTOH,
without this as well, as you said it won't be any worse than current
behavior. But that was not the sole motivation of this proposal as
explained above in the email by giving example.

> Now, it does have the advantage of being not very
> convenient for users, who, I'm sure, would prefer that the system
> figure out for them automatically whether or not parallel inserts are
> likely to be safe, rather than making them declare it, especially
> since presumably the default declaration would have to be "unsafe," as
> it is for functions.
>

To improve the user experience in this regard, the proposal [1]
provides a function pg_get_parallel_safety(oid) using which users can
determine whether it is safe to enable parallelism. Surely, after the
user has checked with that function, one can add some unsafe
constraints to the table by altering the table but it will still be an
aid to enable parallelism on a relation.

[1] -
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Make unlogged table resets detectable