Обсуждение: UPDATE runs slow in a transaction

Поиск
Список
Период
Сортировка

UPDATE runs slow in a transaction

От
Viktor Rosenfeld
Дата:
Hi,

the script below runs very fast when executed alone.  But when I call
it from within a transaction block it's so slow that I have to abort
it after a while.  Specifically the second-to-last UPDATE seems to
take forever within a transaction while it completes in about 3
seconds outside a transaction.

The Postgres manual states that statements should ran faster within a
transaction.

I wonder if I have to up a configuration parameter to give Postgres
more resources or if the problem lies elsewhere.

Thanks for your help,
Viktor

The script:

-- add columns left_token, right_token and copy values from token_index
ALTER TABLE _struct ADD left_token integer;
ALTER TABLE _struct ADD right_token integer;
UPDATE _struct SET left_token = token_index;
UPDATE _struct SET right_token = token_index;

-- set left, right values for non-terminals
-- (use temporary table to get rid of joins between struct and rank)
CREATE TABLE tmp AS SELECT r.pre, r.post, s.id, s.left_token,
s.right_token FROM _rank r, _struct s WHERE r.struct_ref = s.id;
CREATE INDEX idx_tmp_pre_post ON tmp (pre, post);
UPDATE tmp SET left_token = (SELECT min(t2.left_token) FROM tmp t2
WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);
UPDATE tmp SET right_token = (SELECT max(t2.right_token) FROM tmp t2
WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);

-- copy left, right values for everything
CREATE INDEX tmp_id ON tmp (id);
UPDATE _struct SET left_token = (SELECT DISTINCT left_token FROM tmp
WHERE _struct.id = tmp.id);
-- the UPDATE above takes ages when called within a transaction
UPDATE _struct SET right_token = (SELECT DISTINCT right_token FROM tmp
WHERE _struct.id = tmp.id);

-- clean up
DROP TABLE tmp;


Re: UPDATE runs slow in a transaction

От
Tom Lane
Дата:
Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
> the script below runs very fast when executed alone.  But when I call
> it from within a transaction block it's so slow that I have to abort
> it after a while.  Specifically the second-to-last UPDATE seems to
> take forever within a transaction while it completes in about 3
> seconds outside a transaction.

Since the table you're working on was just created in the same
transaction, there's been no opportunity for autovacuum to run an
ANALYZE on it; that's probably preventing selection of a good plan.
Try throwing in an "ANALYZE tmp" after you load the table.

            regards, tom lane

Re: UPDATE runs slow in a transaction

От
Viktor Rosenfeld
Дата:
Hi Tom,

Postgres is indeed selecting a bad plan.  Turns out that the index I created to speed up the UPDATE isn't used inside a transaction block.

Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a transaction:

                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
   SubPlan
     ->  Unique  (cost=8.38..8.40 rows=1 width=4)
           ->  Sort  (cost=8.38..8.39 rows=4 width=4)
                 Sort Key: tmp.left_token
                 ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..8.34 rows=4 width=4)
                       Index Cond: ($0 = id)

And inside a transaction:

                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300 width=70)
   SubPlan
     ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
           ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
                 Sort Key: tmp.left_token
                 ->  Seq Scan on tmp  (cost=100000000.00..100002329.95 rows=4 width=4)
                       Filter: ($0 = id)

The high cost of the seqscan on tmp are because I tried disabling sequential scans inside the transaction to force an index scan, which Postgres decided to ignore in this case.

Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE didn't help either.  (Also shouldn't the creation of an index on tmp (id) take care of analyzing that column?)

Thanks,
Viktor

Am 14.07.2008 um 20:52 schrieb Tom Lane:

Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
the script below runs very fast when executed alone.  But when I call  
it from within a transaction block it's so slow that I have to abort  
it after a while.  Specifically the second-to-last UPDATE seems to  
take forever within a transaction while it completes in about 3  
seconds outside a transaction.

Since the table you're working on was just created in the same
transaction, there's been no opportunity for autovacuum to run an
ANALYZE on it; that's probably preventing selection of a good plan.
Try throwing in an "ANALYZE tmp" after you load the table.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: UPDATE runs slow in a transaction

От
"Pavel Stehule"
Дата:
Hello

my advice is little bit offtopic, I am sorry. Why you use correlated
subquery? Your update statement should be

update _struct set left_token = tmp.left_token from tmp where
_struct.id = tmp.id;

send output of explain analyze statement, please. etc
explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
left_token FROM tmp WHERE _struct.id = tmp.id)

regards
Pavel Stehule



2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
> Hi Tom,
> Postgres is indeed selecting a bad plan.  Turns out that the index I created
> to speed up the UPDATE isn't used inside a transaction block.
> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT
> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a transaction:
>                                         QUERY PLAN
>
> -------------------------------------------------------------------------------------------
>  Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>    SubPlan
>      ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>            ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>                  Sort Key: tmp.left_token
>                  ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..8.34
> rows=4 width=4)
>                        Index Cond: ($0 = id)
> And inside a transaction:
>                                       QUERY PLAN
>
> ---------------------------------------------------------------------------------------
>  Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300
> width=70)
>    SubPlan
>      ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>            ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>                  Sort Key: tmp.left_token
>                  ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
> rows=4 width=4)
>                        Filter: ($0 = id)
> The high cost of the seqscan on tmp are because I tried disabling sequential
> scans inside the transaction to force an index scan, which Postgres decided
> to ignore in this case.
> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE didn't
> help either.  (Also shouldn't the creation of an index on tmp (id) take care
> of analyzing that column?)
> Thanks,
> Viktor
> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>
> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>
> the script below runs very fast when executed alone.  But when I call
>
> it from within a transaction block it's so slow that I have to abort
>
> it after a while.  Specifically the second-to-last UPDATE seems to
>
> take forever within a transaction while it completes in about 3
>
> seconds outside a transaction.
>
> Since the table you're working on was just created in the same
> transaction, there's been no opportunity for autovacuum to run an
> ANALYZE on it; that's probably preventing selection of a good plan.
> Try throwing in an "ANALYZE tmp" after you load the table.
>
> regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

Re: UPDATE runs slow in a transaction

От
Viktor Rosenfeld
Дата:
Hi Pavel,

thanks for the advice on how to uncorrelate the query.  I must admit I
didn't know about the UPDATE ... SET ... FROM ... syntax.

Now the UPDATE runs in an acceptable time inside a transaction,
however the query plan still differs when I run it outside.

Outside a transaction:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
time=0.151..323.856 rows=65756 loops=1)
    Merge Cond: (_struct.id = tmp.id)
    ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
rows=98149 width=1083) (actual time=0.028..137.463 rows=32300 loops=1)
    ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02
rows=65756 width=12) (actual time=0.115..58.601 rows=65756 loops=1)
  Total runtime: 2905.580 ms

This looks like an optimal plan and average run time over 5 runs is
2660 ms.

Inside a transaction:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
time=315.570..574.075 rows=65756 loops=1)
    Merge Cond: (_struct.id = tmp.id)
    ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
rows=98149 width=1083) (actual time=0.020..129.915 rows=32300 loops=1)
    ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
time=315.538..333.359 rows=65756 loops=1)
          Sort Key: tmp.id
          Sort Method:  quicksort  Memory: 4617kB
          ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756
width=12) (actual time=10.070..37.411 rows=65756 loops=1)
  Trigger for constraint _FK_struct_2_collection: time=1105.892
calls=32300
  Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
  Total runtime: 4955.784 ms

Again, the planner does not use the index on tmp (id) although I put
an "ANALYZE tmp" right before the UPDATE.  Average run time over 5
runs is 4610 ms.

Thanks,
Viktor

Am 16.07.2008 um 15:33 schrieb Pavel Stehule:

> Hello
>
> my advice is little bit offtopic, I am sorry. Why you use correlated
> subquery? Your update statement should be
>
> update _struct set left_token = tmp.left_token from tmp where
> _struct.id = tmp.id;
>
> send output of explain analyze statement, please. etc
> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
> left_token FROM tmp WHERE _struct.id = tmp.id)
>
> regards
> Pavel Stehule
>
>
>
> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>> Hi Tom,
>> Postgres is indeed selecting a bad plan.  Turns out that the index
>> I created
>> to speed up the UPDATE isn't used inside a transaction block.
>> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT
>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a
>> transaction:
>>                                        QUERY PLAN
>>
>> -------------------------------------------------------------------------------------------
>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>   SubPlan
>>     ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>           ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>                 Sort Key: tmp.left_token
>>                 ->  Index Scan using idx_tmp__id on tmp
>> (cost=0.00..8.34
>> rows=4 width=4)
>>                       Index Cond: ($0 = id)
>> And inside a transaction:
>>                                      QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------
>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300
>> width=70)
>>   SubPlan
>>     ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>           ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>                 Sort Key: tmp.left_token
>>                 ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>> rows=4 width=4)
>>                       Filter: ($0 = id)
>> The high cost of the seqscan on tmp are because I tried disabling
>> sequential
>> scans inside the transaction to force an index scan, which Postgres
>> decided
>> to ignore in this case.
>> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE
>> didn't
>> help either.  (Also shouldn't the creation of an index on tmp (id)
>> take care
>> of analyzing that column?)
>> Thanks,
>> Viktor
>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>
>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>
>> the script below runs very fast when executed alone.  But when I call
>>
>> it from within a transaction block it's so slow that I have to abort
>>
>> it after a while.  Specifically the second-to-last UPDATE seems to
>>
>> take forever within a transaction while it completes in about 3
>>
>> seconds outside a transaction.
>>
>> Since the table you're working on was just created in the same
>> transaction, there's been no opportunity for autovacuum to run an
>> ANALYZE on it; that's probably preventing selection of a good plan.
>> Try throwing in an "ANALYZE tmp" after you load the table.
>>
>> regards, tom lane
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>


Re: UPDATE runs slow in a transaction

От
"Pavel Stehule"
Дата:
this is strange.

what means "run under transaction"?

you did exactly statements in psql console:
begin;
explain analyze select ...
commit?

regards
Pavel Stehule

2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
> Hi Pavel,
>
> thanks for the advice on how to uncorrelate the query.  I must admit I
> didn't know about the UPDATE ... SET ... FROM ... syntax.
>
> Now the UPDATE runs in an acceptable time inside a transaction, however the
> query plan still differs when I run it outside.
>
> Outside a transaction:
>
>                                                                 QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
> time=0.151..323.856 rows=65756 loops=1)
>   Merge Cond: (_struct.id = tmp.id)
>   ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
> rows=98149 width=1083) (actual time=0.028..137.463 rows=32300 loops=1)
>   ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02 rows=65756
> width=12) (actual time=0.115..58.601 rows=65756 loops=1)
>  Total runtime: 2905.580 ms
>
> This looks like an optimal plan and average run time over 5 runs is 2660 ms.
>
> Inside a transaction:
>
>                                                                 QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
> time=315.570..574.075 rows=65756 loops=1)
>   Merge Cond: (_struct.id = tmp.id)
>   ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
> rows=98149 width=1083) (actual time=0.020..129.915 rows=32300 loops=1)
>   ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
> time=315.538..333.359 rows=65756 loops=1)
>         Sort Key: tmp.id
>         Sort Method:  quicksort  Memory: 4617kB
>         ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756 width=12)
> (actual time=10.070..37.411 rows=65756 loops=1)
>  Trigger for constraint _FK_struct_2_collection: time=1105.892 calls=32300
>  Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
>  Total runtime: 4955.784 ms
>
> Again, the planner does not use the index on tmp (id) although I put an
> "ANALYZE tmp" right before the UPDATE.  Average run time over 5 runs is 4610
> ms.
>
> Thanks,
> Viktor
>
> Am 16.07.2008 um 15:33 schrieb Pavel Stehule:
>
>> Hello
>>
>> my advice is little bit offtopic, I am sorry. Why you use correlated
>> subquery? Your update statement should be
>>
>> update _struct set left_token = tmp.left_token from tmp where
>> _struct.id = tmp.id;
>>
>> send output of explain analyze statement, please. etc
>> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
>> left_token FROM tmp WHERE _struct.id = tmp.id)
>>
>> regards
>> Pavel Stehule
>>
>>
>>
>> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>>>
>>> Hi Tom,
>>> Postgres is indeed selecting a bad plan.  Turns out that the index I
>>> created
>>> to speed up the UPDATE isn't used inside a transaction block.
>>> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT
>>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a transaction:
>>>                                       QUERY PLAN
>>>
>>>
>>> -------------------------------------------------------------------------------------------
>>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>>  SubPlan
>>>    ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>>          ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>>                Sort Key: tmp.left_token
>>>                ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..8.34
>>> rows=4 width=4)
>>>                      Index Cond: ($0 = id)
>>> And inside a transaction:
>>>                                     QUERY PLAN
>>>
>>>
>>> ---------------------------------------------------------------------------------------
>>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300
>>> width=70)
>>>  SubPlan
>>>    ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>>          ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>>                Sort Key: tmp.left_token
>>>                ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>>> rows=4 width=4)
>>>                      Filter: ($0 = id)
>>> The high cost of the seqscan on tmp are because I tried disabling
>>> sequential
>>> scans inside the transaction to force an index scan, which Postgres
>>> decided
>>> to ignore in this case.
>>> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE didn't
>>> help either.  (Also shouldn't the creation of an index on tmp (id) take
>>> care
>>> of analyzing that column?)
>>> Thanks,
>>> Viktor
>>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>>
>>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>>
>>> the script below runs very fast when executed alone.  But when I call
>>>
>>> it from within a transaction block it's so slow that I have to abort
>>>
>>> it after a while.  Specifically the second-to-last UPDATE seems to
>>>
>>> take forever within a transaction while it completes in about 3
>>>
>>> seconds outside a transaction.
>>>
>>> Since the table you're working on was just created in the same
>>> transaction, there's been no opportunity for autovacuum to run an
>>> ANALYZE on it; that's probably preventing selection of a good plan.
>>> Try throwing in an "ANALYZE tmp" after you load the table.
>>>
>>> regards, tom lane
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>
>

Re: UPDATE runs slow in a transaction

От
"Pavel Stehule"
Дата:
hello

second query:

why without transactions are not called triggers constraint
_FK_struct_2_collection and constraint _FK_struct_2_text?

Regards
Pavel Stehule

2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
> Hi Pavel,
>
> thanks for the advice on how to uncorrelate the query.  I must admit I
> didn't know about the UPDATE ... SET ... FROM ... syntax.
>
> Now the UPDATE runs in an acceptable time inside a transaction, however the
> query plan still differs when I run it outside.
>
> Outside a transaction:
>
>                                                                 QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
> time=0.151..323.856 rows=65756 loops=1)
>   Merge Cond: (_struct.id = tmp.id)
>   ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
> rows=98149 width=1083) (actual time=0.028..137.463 rows=32300 loops=1)
>   ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02 rows=65756
> width=12) (actual time=0.115..58.601 rows=65756 loops=1)
>  Total runtime: 2905.580 ms
>
> This looks like an optimal plan and average run time over 5 runs is 2660 ms.
>
> Inside a transaction:
>
>                                                                 QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
> time=315.570..574.075 rows=65756 loops=1)
>   Merge Cond: (_struct.id = tmp.id)
>   ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
> rows=98149 width=1083) (actual time=0.020..129.915 rows=32300 loops=1)
>   ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
> time=315.538..333.359 rows=65756 loops=1)
>         Sort Key: tmp.id
>         Sort Method:  quicksort  Memory: 4617kB
>         ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756 width=12)
> (actual time=10.070..37.411 rows=65756 loops=1)
>  Trigger for constraint _FK_struct_2_collection: time=1105.892 calls=32300
>  Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
>  Total runtime: 4955.784 ms
>
> Again, the planner does not use the index on tmp (id) although I put an
> "ANALYZE tmp" right before the UPDATE.  Average run time over 5 runs is 4610
> ms.
>
> Thanks,
> Viktor
>
> Am 16.07.2008 um 15:33 schrieb Pavel Stehule:
>
>> Hello
>>
>> my advice is little bit offtopic, I am sorry. Why you use correlated
>> subquery? Your update statement should be
>>
>> update _struct set left_token = tmp.left_token from tmp where
>> _struct.id = tmp.id;
>>
>> send output of explain analyze statement, please. etc
>> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
>> left_token FROM tmp WHERE _struct.id = tmp.id)
>>
>> regards
>> Pavel Stehule
>>
>>
>>
>> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>>>
>>> Hi Tom,
>>> Postgres is indeed selecting a bad plan.  Turns out that the index I
>>> created
>>> to speed up the UPDATE isn't used inside a transaction block.
>>> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT
>>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a transaction:
>>>                                       QUERY PLAN
>>>
>>>
>>> -------------------------------------------------------------------------------------------
>>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>>  SubPlan
>>>    ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>>          ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>>                Sort Key: tmp.left_token
>>>                ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..8.34
>>> rows=4 width=4)
>>>                      Index Cond: ($0 = id)
>>> And inside a transaction:
>>>                                     QUERY PLAN
>>>
>>>
>>> ---------------------------------------------------------------------------------------
>>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300
>>> width=70)
>>>  SubPlan
>>>    ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>>          ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>>                Sort Key: tmp.left_token
>>>                ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>>> rows=4 width=4)
>>>                      Filter: ($0 = id)
>>> The high cost of the seqscan on tmp are because I tried disabling
>>> sequential
>>> scans inside the transaction to force an index scan, which Postgres
>>> decided
>>> to ignore in this case.
>>> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE didn't
>>> help either.  (Also shouldn't the creation of an index on tmp (id) take
>>> care
>>> of analyzing that column?)
>>> Thanks,
>>> Viktor
>>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>>
>>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>>
>>> the script below runs very fast when executed alone.  But when I call
>>>
>>> it from within a transaction block it's so slow that I have to abort
>>>
>>> it after a while.  Specifically the second-to-last UPDATE seems to
>>>
>>> take forever within a transaction while it completes in about 3
>>>
>>> seconds outside a transaction.
>>>
>>> Since the table you're working on was just created in the same
>>> transaction, there's been no opportunity for autovacuum to run an
>>> ANALYZE on it; that's probably preventing selection of a good plan.
>>> Try throwing in an "ANALYZE tmp" after you load the table.
>>>
>>> regards, tom lane
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>
>

Re: UPDATE runs slow in a transaction

От
Viktor Rosenfeld
Дата:
Hi,

I have no idea why the trigger constraints are called in the first
place since the respective columns are not touched in the query.  Also
with the old correlated subquery these trigger constraints were not
called either.

Cheers,
Viktor

Am 16.07.2008 um 17:01 schrieb Pavel Stehule:

> hello
>
> second query:
>
> why without transactions are not called triggers constraint
> _FK_struct_2_collection and constraint _FK_struct_2_text?
>
> Regards
> Pavel Stehule
>
> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>> Hi Pavel,
>>
>> thanks for the advice on how to uncorrelate the query.  I must
>> admit I
>> didn't know about the UPDATE ... SET ... FROM ... syntax.
>>
>> Now the UPDATE runs in an acceptable time inside a transaction,
>> however the
>> query plan still differs when I run it outside.
>>
>> Outside a transaction:
>>
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
>> time=0.151..323.856 rows=65756 loops=1)
>>  Merge Cond: (_struct.id = tmp.id)
>>  ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
>> rows=98149 width=1083) (actual time=0.028..137.463 rows=32300
>> loops=1)
>>  ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02
>> rows=65756
>> width=12) (actual time=0.115..58.601 rows=65756 loops=1)
>> Total runtime: 2905.580 ms
>>
>> This looks like an optimal plan and average run time over 5 runs is
>> 2660 ms.
>>
>> Inside a transaction:
>>
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
>> time=315.570..574.075 rows=65756 loops=1)
>>  Merge Cond: (_struct.id = tmp.id)
>>  ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
>> rows=98149 width=1083) (actual time=0.020..129.915 rows=32300
>> loops=1)
>>  ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
>> time=315.538..333.359 rows=65756 loops=1)
>>        Sort Key: tmp.id
>>        Sort Method:  quicksort  Memory: 4617kB
>>        ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756 width=12)
>> (actual time=10.070..37.411 rows=65756 loops=1)
>> Trigger for constraint _FK_struct_2_collection: time=1105.892
>> calls=32300
>> Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
>> Total runtime: 4955.784 ms
>>
>> Again, the planner does not use the index on tmp (id) although I
>> put an
>> "ANALYZE tmp" right before the UPDATE.  Average run time over 5
>> runs is 4610
>> ms.
>>
>> Thanks,
>> Viktor
>>
>> Am 16.07.2008 um 15:33 schrieb Pavel Stehule:
>>
>>> Hello
>>>
>>> my advice is little bit offtopic, I am sorry. Why you use correlated
>>> subquery? Your update statement should be
>>>
>>> update _struct set left_token = tmp.left_token from tmp where
>>> _struct.id = tmp.id;
>>>
>>> send output of explain analyze statement, please. etc
>>> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
>>> left_token FROM tmp WHERE _struct.id = tmp.id)
>>>
>>> regards
>>> Pavel Stehule
>>>
>>>
>>>
>>> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>>>>
>>>> Hi Tom,
>>>> Postgres is indeed selecting a bad plan.  Turns out that the
>>>> index I
>>>> created
>>>> to speed up the UPDATE isn't used inside a transaction block.
>>>> Here's the plan for "UPDATE _struct SET left_token = (SELECT
>>>> DISTINCT
>>>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a
>>>> transaction:
>>>>                                      QUERY PLAN
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------------
>>>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>>> SubPlan
>>>>   ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>>>         ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>>>               Sort Key: tmp.left_token
>>>>               ->  Index Scan using idx_tmp__id on tmp
>>>> (cost=0.00..8.34
>>>> rows=4 width=4)
>>>>                     Index Cond: ($0 = id)
>>>> And inside a transaction:
>>>>                                    QUERY PLAN
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00
>>>> rows=32300
>>>> width=70)
>>>> SubPlan
>>>>   ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>>>         ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>>>               Sort Key: tmp.left_token
>>>>               ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>>>> rows=4 width=4)
>>>>                     Filter: ($0 = id)
>>>> The high cost of the seqscan on tmp are because I tried disabling
>>>> sequential
>>>> scans inside the transaction to force an index scan, which Postgres
>>>> decided
>>>> to ignore in this case.
>>>> Putting an ANALYZE tmp and ANALYZE _struct right before the
>>>> UPDATE didn't
>>>> help either.  (Also shouldn't the creation of an index on tmp
>>>> (id) take
>>>> care
>>>> of analyzing that column?)
>>>> Thanks,
>>>> Viktor
>>>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>>>
>>>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>>>
>>>> the script below runs very fast when executed alone.  But when I
>>>> call
>>>>
>>>> it from within a transaction block it's so slow that I have to
>>>> abort
>>>>
>>>> it after a while.  Specifically the second-to-last UPDATE seems to
>>>>
>>>> take forever within a transaction while it completes in about 3
>>>>
>>>> seconds outside a transaction.
>>>>
>>>> Since the table you're working on was just created in the same
>>>> transaction, there's been no opportunity for autovacuum to run an
>>>> ANALYZE on it; that's probably preventing selection of a good plan.
>>>> Try throwing in an "ANALYZE tmp" after you load the table.
>>>>
>>>> regards, tom lane
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>>
>>>>
>>
>>


Re: UPDATE runs slow in a transaction

От
Viktor Rosenfeld
Дата:
Yes, I either run

begin;
... lots of other stuff in a script
explain analyze update ... (called from the same script)
rollback;

or

... lots of other stuff in a script (same as above)
explain analyze update ...

Cheers,
Viktor

Am 16.07.2008 um 16:58 schrieb Pavel Stehule:

> this is strange.
>
> what means "run under transaction"?
>
> you did exactly statements in psql console:
> begin;
> explain analyze select ...
> commit?
>
> regards
> Pavel Stehule
>
> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>> Hi Pavel,
>>
>> thanks for the advice on how to uncorrelate the query.  I must
>> admit I
>> didn't know about the UPDATE ... SET ... FROM ... syntax.
>>
>> Now the UPDATE runs in an acceptable time inside a transaction,
>> however the
>> query plan still differs when I run it outside.
>>
>> Outside a transaction:
>>
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
>> time=0.151..323.856 rows=65756 loops=1)
>>  Merge Cond: (_struct.id = tmp.id)
>>  ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
>> rows=98149 width=1083) (actual time=0.028..137.463 rows=32300
>> loops=1)
>>  ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02
>> rows=65756
>> width=12) (actual time=0.115..58.601 rows=65756 loops=1)
>> Total runtime: 2905.580 ms
>>
>> This looks like an optimal plan and average run time over 5 runs is
>> 2660 ms.
>>
>> Inside a transaction:
>>
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
>> time=315.570..574.075 rows=65756 loops=1)
>>  Merge Cond: (_struct.id = tmp.id)
>>  ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
>> rows=98149 width=1083) (actual time=0.020..129.915 rows=32300
>> loops=1)
>>  ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
>> time=315.538..333.359 rows=65756 loops=1)
>>        Sort Key: tmp.id
>>        Sort Method:  quicksort  Memory: 4617kB
>>        ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756 width=12)
>> (actual time=10.070..37.411 rows=65756 loops=1)
>> Trigger for constraint _FK_struct_2_collection: time=1105.892
>> calls=32300
>> Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
>> Total runtime: 4955.784 ms
>>
>> Again, the planner does not use the index on tmp (id) although I
>> put an
>> "ANALYZE tmp" right before the UPDATE.  Average run time over 5
>> runs is 4610
>> ms.
>>
>> Thanks,
>> Viktor
>>
>> Am 16.07.2008 um 15:33 schrieb Pavel Stehule:
>>
>>> Hello
>>>
>>> my advice is little bit offtopic, I am sorry. Why you use correlated
>>> subquery? Your update statement should be
>>>
>>> update _struct set left_token = tmp.left_token from tmp where
>>> _struct.id = tmp.id;
>>>
>>> send output of explain analyze statement, please. etc
>>> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
>>> left_token FROM tmp WHERE _struct.id = tmp.id)
>>>
>>> regards
>>> Pavel Stehule
>>>
>>>
>>>
>>> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>>>>
>>>> Hi Tom,
>>>> Postgres is indeed selecting a bad plan.  Turns out that the
>>>> index I
>>>> created
>>>> to speed up the UPDATE isn't used inside a transaction block.
>>>> Here's the plan for "UPDATE _struct SET left_token = (SELECT
>>>> DISTINCT
>>>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a
>>>> transaction:
>>>>                                      QUERY PLAN
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------------
>>>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>>> SubPlan
>>>>   ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>>>         ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>>>               Sort Key: tmp.left_token
>>>>               ->  Index Scan using idx_tmp__id on tmp
>>>> (cost=0.00..8.34
>>>> rows=4 width=4)
>>>>                     Index Cond: ($0 = id)
>>>> And inside a transaction:
>>>>                                    QUERY PLAN
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------
>>>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00
>>>> rows=32300
>>>> width=70)
>>>> SubPlan
>>>>   ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>>>         ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>>>               Sort Key: tmp.left_token
>>>>               ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>>>> rows=4 width=4)
>>>>                     Filter: ($0 = id)
>>>> The high cost of the seqscan on tmp are because I tried disabling
>>>> sequential
>>>> scans inside the transaction to force an index scan, which Postgres
>>>> decided
>>>> to ignore in this case.
>>>> Putting an ANALYZE tmp and ANALYZE _struct right before the
>>>> UPDATE didn't
>>>> help either.  (Also shouldn't the creation of an index on tmp
>>>> (id) take
>>>> care
>>>> of analyzing that column?)
>>>> Thanks,
>>>> Viktor
>>>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>>>
>>>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>>>
>>>> the script below runs very fast when executed alone.  But when I
>>>> call
>>>>
>>>> it from within a transaction block it's so slow that I have to
>>>> abort
>>>>
>>>> it after a while.  Specifically the second-to-last UPDATE seems to
>>>>
>>>> take forever within a transaction while it completes in about 3
>>>>
>>>> seconds outside a transaction.
>>>>
>>>> Since the table you're working on was just created in the same
>>>> transaction, there's been no opportunity for autovacuum to run an
>>>> ANALYZE on it; that's probably preventing selection of a good plan.
>>>> Try throwing in an "ANALYZE tmp" after you load the table.
>>>>
>>>> regards, tom lane
>>>>
>>>> --
>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-general
>>>>
>>>>
>>
>>


Re: UPDATE runs slow in a transaction

От
Tom Lane
Дата:
Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
> Postgres is indeed selecting a bad plan.  Turns out that the index I
> created to speed up the UPDATE isn't used inside a transaction block.

That doesn't make any sense to me, and in fact I cannot replicate any
such behavior here.  What PG version are you running, exactly?

The exact test case I tried is attached --- it's just your original
incomplete example with some dummy data created beforehand.  I get
a plan using the tmp_id index in all supported PG versions.

            regards, tom lane




drop table _struct, _rank;
create table _struct(token_index int, id int);
create table _rank(struct_ref int, pre int, post int);

insert into _struct select i, i from generate_series(1,1000) g(i);
insert into _rank select i, i, i from generate_series(1,1000) g(i);
analyze _struct;
analyze _rank;

begin;

-- add columns left_token, right_token and copy values from token_index
ALTER TABLE _struct ADD left_token integer;
ALTER TABLE _struct ADD right_token integer;
UPDATE _struct SET left_token = token_index;
UPDATE _struct SET right_token = token_index;

-- set left, right values for non-terminals
-- (use temporary table to get rid of joins between struct and rank)
CREATE TABLE tmp AS
  SELECT r.pre, r.post, s.id, s.left_token, s.right_token
  FROM _rank r, _struct s
  WHERE r.struct_ref = s.id;

CREATE INDEX idx_tmp_pre_post ON tmp (pre, post);

UPDATE tmp SET left_token = (SELECT min(t2.left_token) FROM tmp t2 WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);
UPDATE tmp SET right_token = (SELECT max(t2.right_token) FROM tmp t2 WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);

-- copy left, right values for everything
CREATE INDEX tmp_id ON tmp (id);
--analyze tmp;
explain UPDATE _struct SET left_token = (SELECT DISTINCT left_token FROM tmp WHERE _struct.id = tmp.id);
-- the UPDATE above takes ages when called within a transaction
UPDATE _struct SET right_token = (SELECT DISTINCT right_token FROM tmp WHERE _struct.id = tmp.id);

-- clean up
DROP TABLE tmp;

rollback;

Re: UPDATE runs slow in a transaction

От
Tom Lane
Дата:
Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
> Postgres is indeed selecting a bad plan.  Turns out that the index I
> created to speed up the UPDATE isn't used inside a transaction block.

I was having a hard time believing that, but just noticed that there is
a case in which it could be expected to happen, in 8.3.  The HOT patch
has to prevent use of an index in its creating transaction in some
cases.  To quote from README.HOT:

: Practically, we prevent old transactions from using the new index by
: setting pg_index.indcheckxmin to TRUE.  Queries are allowed to use such an
: index only after pg_index.xmin is below their TransactionXmin horizon,
: thereby ensuring that any incompatible rows in HOT chains are dead to them.
: (pg_index.xmin will be the XID of the CREATE INDEX transaction.  The reason
: for using xmin rather than a normal column is that the regular vacuum
: freezing mechanism will take care of converting xmin to FrozenTransactionId
: before it can wrap around.)
:
: This means in particular that the transaction creating the index will be
: unable to use the index.  We alleviate that problem somewhat by not setting
: indcheckxmin unless the table actually contains HOT chains with
: RECENTLY_DEAD members.  (In 8.4 we may be able to improve the situation,
: at least for non-serializable transactions, because we expect to be able to
: advance TransactionXmin intratransaction.)

That "alleviation" could mask the behavior in simple test cases, if
you're testing in an otherwise-idle database.  But in a real workload
it wouldn't be surprising that a new index would fail to be used
immediately, if it were built on a table that had been recently UPDATEd.

I think also that I tried to duplicate the problem in HEAD rather than
8.3, which means that the TransactionXmin advance code also helped to
keep me from seeing it.

            regards, tom lane