Re: UPDATE runs slow in a transaction

Поиск
Список
Период
Сортировка
От Viktor Rosenfeld
Тема Re: UPDATE runs slow in a transaction
Дата
Msg-id 06F6D3BB-BAA3-4C7A-9B3C-99039F55E34B@informatik.hu-berlin.de
обсуждение исходный текст
Ответ на Re: UPDATE runs slow in a transaction  ("Pavel Stehule" <pavel.stehule@gmail.com>)
Ответы Re: UPDATE runs slow in a transaction  ("Pavel Stehule" <pavel.stehule@gmail.com>)
Re: UPDATE runs slow in a transaction  ("Pavel Stehule" <pavel.stehule@gmail.com>)
Список pgsql-general
Hi Pavel,

thanks for the advice on how to uncorrelate the query.  I must admit I
didn't know about the UPDATE ... SET ... FROM ... syntax.

Now the UPDATE runs in an acceptable time inside a transaction,
however the query plan still differs when I run it outside.

Outside a transaction:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
time=0.151..323.856 rows=65756 loops=1)
    Merge Cond: (_struct.id = tmp.id)
    ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
rows=98149 width=1083) (actual time=0.028..137.463 rows=32300 loops=1)
    ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02
rows=65756 width=12) (actual time=0.115..58.601 rows=65756 loops=1)
  Total runtime: 2905.580 ms

This looks like an optimal plan and average run time over 5 runs is
2660 ms.

Inside a transaction:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
time=315.570..574.075 rows=65756 loops=1)
    Merge Cond: (_struct.id = tmp.id)
    ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
rows=98149 width=1083) (actual time=0.020..129.915 rows=32300 loops=1)
    ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
time=315.538..333.359 rows=65756 loops=1)
          Sort Key: tmp.id
          Sort Method:  quicksort  Memory: 4617kB
          ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756
width=12) (actual time=10.070..37.411 rows=65756 loops=1)
  Trigger for constraint _FK_struct_2_collection: time=1105.892
calls=32300
  Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
  Total runtime: 4955.784 ms

Again, the planner does not use the index on tmp (id) although I put
an "ANALYZE tmp" right before the UPDATE.  Average run time over 5
runs is 4610 ms.

Thanks,
Viktor

Am 16.07.2008 um 15:33 schrieb Pavel Stehule:

> Hello
>
> my advice is little bit offtopic, I am sorry. Why you use correlated
> subquery? Your update statement should be
>
> update _struct set left_token = tmp.left_token from tmp where
> _struct.id = tmp.id;
>
> send output of explain analyze statement, please. etc
> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
> left_token FROM tmp WHERE _struct.id = tmp.id)
>
> regards
> Pavel Stehule
>
>
>
> 2008/7/16 Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de>:
>> Hi Tom,
>> Postgres is indeed selecting a bad plan.  Turns out that the index
>> I created
>> to speed up the UPDATE isn't used inside a transaction block.
>> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT
>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a
>> transaction:
>>                                        QUERY PLAN
>>
>> -------------------------------------------------------------------------------------------
>> Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
>>   SubPlan
>>     ->  Unique  (cost=8.38..8.40 rows=1 width=4)
>>           ->  Sort  (cost=8.38..8.39 rows=4 width=4)
>>                 Sort Key: tmp.left_token
>>                 ->  Index Scan using idx_tmp__id on tmp
>> (cost=0.00..8.34
>> rows=4 width=4)
>>                       Index Cond: ($0 = id)
>> And inside a transaction:
>>                                      QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------
>> Seq Scan on _struct  (cost=100000000.00..3230175260746.00 rows=32300
>> width=70)
>>   SubPlan
>>     ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
>>           ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
>>                 Sort Key: tmp.left_token
>>                 ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
>> rows=4 width=4)
>>                       Filter: ($0 = id)
>> The high cost of the seqscan on tmp are because I tried disabling
>> sequential
>> scans inside the transaction to force an index scan, which Postgres
>> decided
>> to ignore in this case.
>> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE
>> didn't
>> help either.  (Also shouldn't the creation of an index on tmp (id)
>> take care
>> of analyzing that column?)
>> Thanks,
>> Viktor
>> Am 14.07.2008 um 20:52 schrieb Tom Lane:
>>
>> Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
>>
>> the script below runs very fast when executed alone.  But when I call
>>
>> it from within a transaction block it's so slow that I have to abort
>>
>> it after a while.  Specifically the second-to-last UPDATE seems to
>>
>> take forever within a transaction while it completes in about 3
>>
>> seconds outside a transaction.
>>
>> Since the table you're working on was just created in the same
>> transaction, there's been no opportunity for autovacuum to run an
>> ANALYZE on it; that's probably preventing selection of a good plan.
>> Try throwing in an "ANALYZE tmp" after you load the table.
>>
>> regards, tom lane
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>


В списке pgsql-general по дате отправления:

Предыдущее
От: "Haim Ashkenazi"
Дата:
Сообщение: Which design would be faster ...
Следующее
От: Guillaume Lelarge
Дата:
Сообщение: Re: roll back to 8.1 for PyQt driver work-around