Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
От | Raj |
---|---|
Тема | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |
Дата | |
Msg-id | eec4db8f-336a-8bb1-fc82-086abadec9bd@gusw.net обсуждение исходный текст |
Ответ на | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED (Jim Jarvie <jim@talentstack.to>) |
Ответы |
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
(Jim Jarvie <jim@talentstack.to>)
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-performance |
Hi all, especially Jim Jarvie, I saw your email to me only now on my related issue. My issue remains this one: > Well this got interesting - the already moved error showed up: and I have already gone through all those index pruning and all that good stuff. I remain with my original question from 30th of June, to me it feels like a bug of some sort: > "tuple to be locked was already moved to another partition due to > concurrent update" > > This would not exactly look like a bug, because the message says "to > be locked", so at least it's not allowing two workers to lock the same > tuple. But it seems that the skip-locked mode should not make an error > out of this, but treat it as the tuple was already locked. Why would > it want to lock the tuple (representing the job) if another worker has > already finished his UPDATE of the job to mark it as "done" (which is > what makes the tuple move to the "completed" partition.) > > Either the SELECT for jobs to do returned a wrong tuple, which was > already updated, or there is some lapse in the locking. > > Either way it would seem to be a waste of time throwing all these > errors when the tuple should not even have been selected for update > and locking. > > I wonder if anybody knows anything about that issue? Of course you'll > want to see the DDL and SQL queries, etc. but you can't really try it > out unless you do some massively parallel magic. I still think that it should simply not happen. Don't waste time on old tuples trying to fetch and lock something that's no longer there. It's a waste of resources. regards, -Gunther On 8/20/2020 6:39 PM, Jim Jarvie wrote: > > > On 20-Aug.-2020 17:42, Michael Lewis wrote: >> Can you share an explain analyze for the query that does the select for >> update? I wouldn't assume that partition pruning is possible at all with >> hash, and it would be interesting to see how it is finding those rows. > > Well this got interesting - the already moved error showed up: Note, > the actual process partitions are regular table partitions, these are > not hashed. Only the incoming and completed are hashed due to row > counts at either end of the processing; in flight (where the issue > shows up) is quite small: > > [queuedb] # explain analyze select queueid,txobject,objectid,state > from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') > and txobject = 'ticket' limit 250 for update skip locked; > ERROR: 40001: tuple to be locked was already moved to another > partition due to concurrent update > LOCATION: heapam_tuple_lock, heapam_handler.c:405 > Time: 579.131 ms > [queuedb] # explain analyze select queueid,txobject,objectid,state > from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') > and txobject = 'ticket' limit 250 for update skip locked; > ERROR: 40001: tuple to be locked was already moved to another > partition due to concurrent update > LOCATION: heapam_tuple_lock, heapam_handler.c:405 > Time: 568.008 ms > [queuedb] # explain analyze select queueid,txobject,objectid,state > from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') > and txobject = 'ticket' limit 250 for update skip locked; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..25.71 rows=250 width=34) (actual > time=1306.041..1306.338 rows=250 loops=1) > -> LockRows (cost=0.00..7934.38 rows=77150 width=34) (actual > time=1306.040..1306.315 rows=250 loops=1) > -> Append (cost=0.00..7162.88 rows=77150 width=34) (actual > time=520.685..1148.347 rows=31500 loops=1) > -> Seq Scan on queue_tx_active (cost=0.00..6764.50 > rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1) > Filter: ((txobject = 'ticket'::mq.queue_object) > AND ((state = 'tx_active'::mq.tx_state) OR (state = > 'tx_fail_retryable'::mq.tx_state))) > -> Seq Scan on queue_tx_fail_retryable > (cost=0.00..12.62 rows=150 width=34) (never executed) > Filter: ((txobject = 'ticket'::mq.queue_object) > AND ((state = 'tx_active'::mq.tx_state) OR (state = > 'tx_fail_retryable'::mq.tx_state))) > Planning Time: 0.274 ms > Execution Time: 1306.380 ms > (9 rows) > > Time: 1317.150 ms (00:01.317) > [queuedb] # >
В списке pgsql-performance по дате отправления: