Обсуждение: Parallel safety of CURRENT_* family

Поиск
Список
Период
Сортировка

Parallel safety of CURRENT_* family

От
Дата:
Hello

How should I mark a function which calls CURRENT_DATE? Parallel safe or parallel restricted?

pg_proc shows that now() is marked as restricted, but transaction_timestamp() is marked as safe.

The manual (https://www.postgresql.org/docs/9.6/static/functions-datetime.html) says that "now() is a traditional
PostgreSQLequivalent to transaction_timestamp()" and "transaction_timestamp() is equivalent to CURRENT_TIMESTAMP". 

The code seems to confirm: GetSQLCurrentDate(), GetSQLCurrentTimestamp() and now() all invoke the same
GetCurrentTransactionStartTimestamp().

It looks like that functions which uses the CURRENT_* family of functions should be marked as restricted (something to
addto the docs) and that transaction_timestamp() is incorrectly marked as safe. 

Am I wrong?

Mark Kolar





----
Sent using Guerrillamail.com
Block or report abuse: https://www.guerrillamail.com/abuse/?a=UUZnFAFNS7UWgwum414PfQfTStiVwcZciatQew%3D%3D





Re: Parallel safety of CURRENT_* family

От
Tom Lane
Дата:
<5bih4k+4jfl6m39j23k@guerrillamail.com> writes:
> How should I mark a function which calls CURRENT_DATE? Parallel safe or parallel restricted?

> pg_proc shows that now() is marked as restricted, but transaction_timestamp() is marked as safe.

That's certainly silly, because they're equivalent.  I should think
they're both safe.  Robert?
        regards, tom lane



Re: Parallel safety of CURRENT_* family

От
Tom Lane
Дата:
I wrote:
> <5bih4k+4jfl6m39j23k@guerrillamail.com> writes:
>> pg_proc shows that now() is marked as restricted, but transaction_timestamp() is marked as safe.

> That's certainly silly, because they're equivalent.  I should think
> they're both safe.  Robert?

... well, they would be if we passed down xactStartTimestamp to parallel
workers, but I can't find any code that does that.  In view of the fact that
transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.
        regards, tom lane



Re: Parallel safety of CURRENT_* family

От
Robert Haas
Дата:
On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> <5bih4k+4jfl6m39j23k@guerrillamail.com> writes:
>>> pg_proc shows that now() is marked as restricted, but transaction_timestamp() is marked as safe.
>
>> That's certainly silly, because they're equivalent.  I should think
>> they're both safe.  Robert?
>
> ... well, they would be if we passed down xactStartTimestamp to parallel
> workers, but I can't find any code that does that.  In view of the fact that
> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.

Yeah.  Do you think we should arrange to pass that down, or change the marking?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel safety of CURRENT_* family

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... well, they would be if we passed down xactStartTimestamp to parallel
>> workers, but I can't find any code that does that.  In view of the fact that
>> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.

> Yeah.  Do you think we should arrange to pass that down, or change the marking?

We can't fix the marking in existing 9.6 installations, so I think we
have to pass it down.  (Which would be a better response anyway.)

Having said that, I find myself unable to reproduce a problem.
This should fail:

regression=# set parallel_setup_cost TO 0;
SET
regression=# set parallel_tuple_cost TO 0;
SET
regression=# set min_parallel_relation_size TO 0;
SET
regression=# set enable_indexscan TO 0;
SET
regression=# explain verbose select distinct transaction_timestamp() from tenk1;
QUERYPLAN                                       
--------------------------------------------------------------------------------------Unique  (cost=0.00..424.67 rows=1
width=8) Output: (transaction_timestamp())  ->  Gather  (cost=0.00..424.67 rows=10000 width=8)        Output:
(transaction_timestamp())       Workers Planned: 2        ->  Parallel Seq Scan on public.tenk1  (cost=0.00..410.08
rows=4167width=8)              Output: transaction_timestamp() 
(7 rows)

but it doesn't:

regression=# select distinct transaction_timestamp() from tenk1;    transaction_timestamp
-------------------------------2016-12-01 15:44:12.839417-05
(1 row)

How is that happening?
        regards, tom lane



Re: Parallel safety of CURRENT_* family

От
Robert Haas
Дата:
On Thu, Dec 1, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... well, they would be if we passed down xactStartTimestamp to parallel
>>> workers, but I can't find any code that does that.  In view of the fact that
>>> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.
>
>> Yeah.  Do you think we should arrange to pass that down, or change the marking?
>
> We can't fix the marking in existing 9.6 installations, so I think we
> have to pass it down.  (Which would be a better response anyway.)
>
> Having said that, I find myself unable to reproduce a problem.
> This should fail:
>
> regression=# set parallel_setup_cost TO 0;
> SET
> regression=# set parallel_tuple_cost TO 0;
> SET
> regression=# set min_parallel_relation_size TO 0;
> SET
> regression=# set enable_indexscan TO 0;
> SET
> regression=# explain verbose select distinct transaction_timestamp() from tenk1;
>                                       QUERY PLAN
> --------------------------------------------------------------------------------------
>  Unique  (cost=0.00..424.67 rows=1 width=8)
>    Output: (transaction_timestamp())
>    ->  Gather  (cost=0.00..424.67 rows=10000 width=8)
>          Output: (transaction_timestamp())
>          Workers Planned: 2
>          ->  Parallel Seq Scan on public.tenk1  (cost=0.00..410.08 rows=4167 width=8)
>                Output: transaction_timestamp()
> (7 rows)
>
> but it doesn't:
>
> regression=# select distinct transaction_timestamp() from tenk1;
>      transaction_timestamp
> -------------------------------
>  2016-12-01 15:44:12.839417-05
> (1 row)
>
> How is that happening?

Because the table is so small, the leader probably finishes running
the whole plan before the workers finish starting up.

You can see the problem like this, though:

rhaas=# begin;
BEGIN
rhaas=# select transaction_timestamp();    transaction_timestamp
-------------------------------2016-12-01 15:51:14.443116-05
(1 row)

rhaas=# select transaction_timestamp();    transaction_timestamp
-------------------------------2016-12-01 15:51:14.443116-05
(1 row)

rhaas=# select transaction_timestamp();    transaction_timestamp
-------------------------------2016-12-01 15:51:14.443116-05
(1 row)

rhaas=# set force_parallel_mode = true;
SET
rhaas=# select transaction_timestamp();    transaction_timestamp
-------------------------------2016-12-01 15:51:26.603302-05
(1 row)

rhaas=# select transaction_timestamp();    transaction_timestamp
-------------------------------2016-12-01 15:51:27.316032-05
(1 row)

force_parallel_mode causes the whole plan to be run by the worker,
without any participation by the leader.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel safety of CURRENT_* family

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 1, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> but it doesn't:
>> 
>> regression=# select distinct transaction_timestamp() from tenk1;
>> transaction_timestamp
>> -------------------------------
>> 2016-12-01 15:44:12.839417-05
>> (1 row)
>> 
>> How is that happening?

> Because the table is so small, the leader probably finishes running
> the whole plan before the workers finish starting up.

Good try, but EXPLAIN ANALYZE says that the workers are processing
some of the rows.  Also, I see the same behavior with a much larger
test table.

> You can see the problem like this, though:

Yeah, I didn't have any doubt that it was real.  Still don't know
why my test case isn't doing what I expected, though.
        regards, tom lane



Re: Parallel safety of CURRENT_* family

От
Tom Lane
Дата:
I wrote:
> Yeah, I didn't have any doubt that it was real.  Still don't know
> why my test case isn't doing what I expected, though.

Doh: the planner knows that transaction_timestamp() is stable, so
it concludes that the DISTINCT condition is vacuous.  There is a
"Unique" node in the plan, but it has zero columns to compare, so
it thinks the tuple are all equivalent and emits only the first.

I had noticed that there was no "Sort" node, but failed to realize
that that implied the "Unique" node was degenerate.

Maybe this is over-optimization, but I think we'd be very sad if
the planner didn't do it; getting rid of useless sort columns is
critical in a lot of situations.
        regards, tom lane



Re: [HACKERS] Parallel safety of CURRENT_* family

От
Robert Haas
Дата:
On Thu, Dec 1, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... well, they would be if we passed down xactStartTimestamp to parallel
>>> workers, but I can't find any code that does that.  In view of the fact that
>>> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.
>
>> Yeah.  Do you think we should arrange to pass that down, or change the marking?
>
> We can't fix the marking in existing 9.6 installations, so I think we
> have to pass it down.  (Which would be a better response anyway.)

I happened across this thread today and took a look at what it would
take to fix this.  I quickly ran up against the fact that
SerializeTransactionState() and RestoreTransactionState() are not
exactly brilliantly designed, relying on the notion that each
individual value that we want to serialize will be no wider than a
TransactionId, which won't be true for timestamps.  Even apart from
that, the whole design of those functions is pretty lame, and I'm
pretty sure I wrote all of that code myself, so I have nobody to blame
but me.  Anyway, here's a proposed patch to refactor that code into
something a little more reasonable.  It doesn't fix the actual problem
here, but I think it's a good first step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения