Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Поиск
Список
Период
Сортировка
От Raj
Тема Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Дата
Msg-id eec4db8f-336a-8bb1-fc82-086abadec9bd@gusw.net
обсуждение исходный текст
Ответ на Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (Jim Jarvie <jim@talentstack.to>)
Ответы Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (Jim Jarvie <jim@talentstack.to>)
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-performance
Hi all, especially Jim Jarvie, I saw your email to me only now on my 
related issue. My issue remains this one:

> Well this got interesting  - the already moved error showed up: 
and I have already gone through all those index pruning and all that 
good stuff.

I remain with my original question from 30th of June, to me it feels 
like a bug of some sort:

> "tuple to be locked was already moved to another partition due to 
> concurrent update"
>
> This would not exactly look like a bug, because the message says "to 
> be locked", so at least it's not allowing two workers to lock the same 
> tuple. But it seems that the skip-locked mode should not make an error 
> out of this, but treat it as the tuple was already locked. Why would 
> it want to lock the tuple (representing the job) if another worker has 
> already finished his UPDATE of the job to mark it as "done" (which is 
> what makes the tuple move to the "completed" partition.)
>
> Either the SELECT for jobs to do returned a wrong tuple, which was 
> already updated, or there is some lapse in the locking.
>
> Either way it would seem to be a waste of time throwing all these 
> errors when the tuple should not even have been selected for update 
> and locking.
>
> I wonder if anybody knows anything about that issue? Of course you'll 
> want to see the DDL and SQL queries, etc. but you can't really try it 
> out unless you do some massively parallel magic.

I still think that it should simply not happen. Don't waste time on old 
tuples trying to fetch and lock something that's no longer there. It's a 
waste of resources.

regards,
-Gunther

On 8/20/2020 6:39 PM, Jim Jarvie wrote:
>
>
> On 20-Aug.-2020 17:42, Michael Lewis wrote:
>> Can you share an explain analyze for the query that does the select for
>> update? I wouldn't assume that partition pruning is possible at all with
>> hash, and it would be interesting to see how it is finding those rows.
>
> Well this got interesting  - the already moved error showed up:  Note, 
> the actual process partitions are regular table partitions, these are 
> not hashed.  Only the incoming and completed are hashed due to row 
> counts at either end of the processing; in flight (where the issue 
> shows up) is quite small:
>
> [queuedb] # explain analyze select queueid,txobject,objectid,state 
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
> and txobject = 'ticket' limit 250 for update skip locked;
> ERROR:  40001: tuple to be locked was already moved to another 
> partition due to concurrent update
> LOCATION:  heapam_tuple_lock, heapam_handler.c:405
> Time: 579.131 ms
> [queuedb] # explain analyze select queueid,txobject,objectid,state 
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
> and txobject = 'ticket' limit 250 for update skip locked;
> ERROR:  40001: tuple to be locked was already moved to another 
> partition due to concurrent update
> LOCATION:  heapam_tuple_lock, heapam_handler.c:405
> Time: 568.008 ms
> [queuedb] # explain analyze select queueid,txobject,objectid,state 
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
> and txobject = 'ticket' limit 250 for update skip locked;
>         QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..25.71 rows=250 width=34) (actual 
> time=1306.041..1306.338 rows=250 loops=1)
>    ->  LockRows  (cost=0.00..7934.38 rows=77150 width=34) (actual 
> time=1306.040..1306.315 rows=250 loops=1)
>          ->  Append  (cost=0.00..7162.88 rows=77150 width=34) (actual 
> time=520.685..1148.347 rows=31500 loops=1)
>                ->  Seq Scan on queue_tx_active  (cost=0.00..6764.50 
> rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1)
>                      Filter: ((txobject = 'ticket'::mq.queue_object) 
> AND ((state = 'tx_active'::mq.tx_state) OR (state = 
> 'tx_fail_retryable'::mq.tx_state)))
>                ->  Seq Scan on queue_tx_fail_retryable 
>  (cost=0.00..12.62 rows=150 width=34) (never executed)
>                      Filter: ((txobject = 'ticket'::mq.queue_object) 
> AND ((state = 'tx_active'::mq.tx_state) OR (state = 
> 'tx_fail_retryable'::mq.tx_state)))
>  Planning Time: 0.274 ms
>  Execution Time: 1306.380 ms
> (9 rows)
>
> Time: 1317.150 ms (00:01.317)
> [queuedb] #
>



В списке pgsql-performance по дате отправления:

Предыдущее
От: Nagaraj Raj
Дата:
Сообщение: Re: Query performance issue
Следующее
От: Dirk Krautschick
Дата:
Сообщение: AW: Query performance issue