Обсуждение: WARNING in parallel index creation.

Поиск
Список
Период
Сортировка

WARNING in parallel index creation.

От
Jeff Janes
Дата:
If i run:

pgbench -i -s30

And then create the function:

CREATE OR REPLACE FUNCTION foobar(text)
 RETURNS text
 LANGUAGE plperl
 IMMUTABLE PARALLEL SAFE STRICT COST 10000
AS $function$
  return scalar reverse($_[0]);
$function$;

Then when I create in index, I get a warning:

jjanes=# create index on pgbench_accounts (foobar(filler));
WARNING:  cannot set parameters during a parallel operation
WARNING:  cannot set parameters during a parallel operation

If I create the index again within the same session, there is no WARNING.

This only occurs if plperl.on_init is set in the postgresql.conf file.  It doesn't seem to matter what it is set to, 
even the empty string triggers the warning.

plperl.on_init=''

As far as I can tell the index is created correctly despite the warning.

Cheers,

Jeff

Re: WARNING in parallel index creation.

От
Peter Geoghegan
Дата:
On Sun, Mar 11, 2018 at 10:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Then when I create in index, I get a warning:
>
> jjanes=# create index on pgbench_accounts (foobar(filler));
> WARNING:  cannot set parameters during a parallel operation
> WARNING:  cannot set parameters during a parallel operation
>
> If I create the index again within the same session, there is no WARNING.
>
> This only occurs if plperl.on_init is set in the postgresql.conf file.  It
> doesn't seem to matter what it is set to,
> even the empty string triggers the warning.

I wonder why DefineCustomStringVariable() does not set var->reset_val.
We see that within DefineCustomEnumVariable(),
DefineCustomRealVariable(), DefineCustomIntVariable(), etc.

-- 
Peter Geoghegan


Re: WARNING in parallel index creation.

От
Andres Freund
Дата:
On 2018-03-12 20:44:01 -0700, Peter Geoghegan wrote:
> On Sun, Mar 11, 2018 at 10:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Then when I create in index, I get a warning:
> >
> > jjanes=# create index on pgbench_accounts (foobar(filler));
> > WARNING:  cannot set parameters during a parallel operation
> > WARNING:  cannot set parameters during a parallel operation
> >
> > If I create the index again within the same session, there is no WARNING.
> >
> > This only occurs if plperl.on_init is set in the postgresql.conf file.  It
> > doesn't seem to matter what it is set to,
> > even the empty string triggers the warning.
> 
> I wonder why DefineCustomStringVariable() does not set var->reset_val.
> We see that within DefineCustomEnumVariable(),
> DefineCustomRealVariable(), DefineCustomIntVariable(), etc.

Peter, have you investigated this any further? This has been an open
item for a while.

Greetings,

Andres Freund


Re: WARNING in parallel index creation.

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-03-12 20:44:01 -0700, Peter Geoghegan wrote:
>> I wonder why DefineCustomStringVariable() does not set var->reset_val.
>> We see that within DefineCustomEnumVariable(),
>> DefineCustomRealVariable(), DefineCustomIntVariable(), etc.

> Peter, have you investigated this any further? This has been an open
> item for a while.

It looks to me like the "oversight" in DefineCustomStringVariable is
not really one; rather, the reset_val assignments in the sibling
routines are dead code.  define_custom_variable will call
InitializeOneGUCOption which overwrites reset_val from the boot_val
in all cases.  We should probably clean this up by removing the dead
assignments.

The WARNING seems to indicate that the error check in set_config_option
is too aggressive.  I kind of wonder why it was placed there at all,
rather than in SQL-level operations like ExecSetVariableStmt.

            regards, tom lane


Re: WARNING in parallel index creation.

От
Tom Lane
Дата:
I wrote:
> The WARNING seems to indicate that the error check in set_config_option
> is too aggressive.  I kind of wonder why it was placed there at all,
> rather than in SQL-level operations like ExecSetVariableStmt.

BTW, looking back at the thread, nobody seems to have posted an analysis
of why this happens, but it's pretty simple.  During initial library load
of plperl.so, that module does DefineCustomStringVariable.  If there is
a pre-existing reset_val of the variable (taken from the postgresql.conf
file, in this example), then define_custom_variable uses set_config_option
to apply that value to the newly created variable struct.  So if plperl
library load happens during a parallel operation, the IsInParallelMode
check in set_config_option will bleat about it.

That test is, therefore, wrong.  Otherwise, no non-builtin function
could ever be marked parallel safe, for fear that the shlib it lives
in might try to set up a custom variable at load time.

I'm of the opinion that having such a test here at all is crummy design.
It implies that set_config_option is in charge of knowing about every
one of its callers and passing judgment on whether they represent
parallel-safe usages, which is the exact opposite of modularity,
even if set_config_option had enough information to make that judgment
which it does not.

In any case, the parallel index build patch is not at fault, it just
happens to be involved in this particular example.  I'd put the blame
on commit 924bcf4f1 which installed this test in the first place.

            regards, tom lane


Re: WARNING in parallel index creation.

От
Robert Haas
Дата:
On Wed, Apr 11, 2018 at 2:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That test is, therefore, wrong.  Otherwise, no non-builtin function
> could ever be marked parallel safe, for fear that the shlib it lives
> in might try to set up a custom variable at load time.

I don't follow that logic.  If the check is more stringent than it
needs to be for safety, then we can relax it.  If the check is less
stringent than it needs to be for safety, we need to tighten it.  If
it's exactly right, then it has to stay the way it is, even if that
prohibits some things we wish we could allow.

> I'm of the opinion that having such a test here at all is crummy design.
> It implies that set_config_option is in charge of knowing about every
> one of its callers and passing judgment on whether they represent
> parallel-safe usages, which is the exact opposite of modularity,
> even if set_config_option had enough information to make that judgment
> which it does not.

The point of SerializeGUCState and RestoreGUCState is to enforce an
invariant that all cooperating processes have the same idea about the
values of every GUC.  If you let the GUCs be changed, then isn't that
invariant is violated regardless of why you let it happen?  For
example:

rhaas=# set pg_trgm.similarity_threshold = '002';
SET
rhaas=# select current_setting('pg_trgm.similarity_threshold');
 current_setting
-----------------
 002
(1 row)

rhaas=# load 'pg_trgm';
WARNING:  2 is outside the valid range for parameter
"pg_trgm.similarity_threshold" (0 .. 1)
LOAD
rhaas=# select current_setting('pg_trgm.similarity_threshold');
 current_setting
-----------------
 0.3
(1 row)

This shows that if you load a library that defines a custom variable
in process A but not process B, it's trivial to construct a query that
no longer returns the same answer in both backends, which isn't too
good, because the point of parallel query is to deliver the same
answer with parallel query that you would have gotten without it, or
at the very least, to make it look like the answer was generated by a
single process with a unified view of the world rather than multiple
uncoordinated processes.

I tried the following test with Jeff's example function and with
plperl.on_init='' and it does not produce a warning:

rhaas=# set force_parallel_mode = on;
SET
rhaas=# select foobar('dsgsdg');
 foobar
--------
 gdsgsd
(1 row)

I haven't tracked it down, but I think what must be going on here is
that, when the function is called via a query, something triggers the
shared library to get loaded before we enter parallel mode.  Then
things work OK, because the worker will load the shared library before
restoring the leader's GUC state, and so the state is actually
guaranteed to match.  Here it will *probably* end up matching because
most likely every process that loads the module will interpret the
existing textual setting of the newly-defined GUC in the same way, but
it's a bit less ironclad, and if they decide to emit warnings (as in
my first example, above) then you'll get multiple copies of them.  So
I don't think just drilling a hole through the prohibition in
set_config_option() is a particularly appealing solution.  It would
amount to suppressing a warning that is basically legitimate.

I wonder if there's an easy way to force the libraries that we're
going to need to be loaded before we reach CreateParallelContext().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company