Re: Performance Concern

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Performance Concern
Дата
Msg-id 87wuattkx8.fsf@stark.dyndns.tv
обсуждение исходный текст
Ответ на Re: Performance Concern  ("John Pagakis" <thebfh@toolsmythe.com>)
Список pgsql-performance
"John Pagakis" <thebfh@toolsmythe.com> writes:

> UPDATE baz SET customer_id = '1234' WHERE baz_key IN( SELECT baz_key FROM
> baz WHERE customer_id IS NULL LIMIT 1000 );

Do an "explain analyze" on this query. I bet it's doing two sequential scans.
Unfortunately in 7.3 the WHERE IN type of clause is poorly handled. If you're
still in development perhaps you should move to the 7.4 beta as it should
handle this much better:

test74=> explain UPDATE test SET customer_id = 1 WHERE a IN (SELECT a FROM test WHERE customer_id IS NULL LIMIT 1000 );
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Nested Loop  (cost=1447.26..2069.43 rows=201 width=10)
   ->  HashAggregate  (cost=1447.26..1447.26 rows=200 width=4)
         ->  Subquery Scan "IN_subquery"  (cost=0.00..1446.01 rows=501 width=4)
               ->  Limit  (cost=0.00..1441.00 rows=501 width=4)
                     ->  Seq Scan on test  (cost=0.00..1441.00 rows=501 width=4)
                           Filter: (customer_id IS NULL)
   ->  Index Scan using ii on test  (cost=0.00..3.10 rows=1 width=10)
         Index Cond: (test.a = "outer".a)
(8 rows)


However notice you still get at the one sequential scan. One way to help the
situation would be to create a partial index WHERE customer_id IS NULL. This
would especially help when things are almost completely sold out and available
slots are sparse.

slo=> explain UPDATE test SET customer_id = 1 WHERE a IN (SELECT a FROM test WHERE customer_id IS NULL LIMIT 1000 );
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop  (cost=181.01..803.18 rows=201 width=10)
   ->  HashAggregate  (cost=181.01..181.01 rows=200 width=4)
         ->  Subquery Scan "IN_subquery"  (cost=0.00..179.76 rows=501 width=4)
               ->  Limit  (cost=0.00..174.75 rows=501 width=4)
                     ->  Index Scan using i on test  (cost=0.00..174.75 rows=501 width=4)
                           Filter: (customer_id IS NULL)
   ->  Index Scan using ii on test  (cost=0.00..3.10 rows=1 width=10)
         Index Cond: (test.a = "outer".a)
(8 rows)

Notice the both sequential scans are gone and replaced by index scans.


I kind of worry you might still have a race condition with the above query.
Two clients could do the subquery and pick up the same records, then both run
and update them. The database would lock the records until the first one
commits but I don't think that would stop the second one from updating them a
second time.

Perhaps moving to serializable transactions would help this, I'm not sure.

It's too bad the LIMIT clause doesn't work on UPDATEs.
Then you could simply do:

UPDATE baz SET customer_id = '1234' where customer_id IS NULL LIMIT 1000

Which shouldn't have to scan the table twice at all and I don't think suffer
from any race conditions.

--
greg

В списке pgsql-performance по дате отправления:

Предыдущее
От: Allen Landsidel
Дата:
Сообщение: Re: Performance Concern
Следующее
От: "John Pagakis"
Дата:
Сообщение: Re: Performance Concern