CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Поиск
Список
Период
Сортировка
От Jim Jarvie
Тема CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Дата
Msg-id c192f8bf-a747-6ad9-c54d-1bd6febafc4f@talentstack.to
обсуждение исходный текст
Ответы Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (Michael Lewis <mlewis@entrata.com>)
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-performance

Using V12, Linux [Ubuntu 16.04LTS]

I have a system which implements a message queue with the basic pattern that a process selects a group of, for example 250, rows for processing via SELECT .. LIMIT 250 FOR UPDATE SKIP LOCKED. When there are a small number of concurrent connections to process the queue, this seems to work as expected and connections quickly obtain a unique block of 250 rows for processing.

However, as I scale up the number of concurrent connections, I see a spike in CPU (to 100% across 80 cores) when the SELECT FOR UPDATE SKIP LOCKED executes and the select processes wait for multiple minutes (10-20 minutes) before completing.  My use case requires around 256 concurrent processors for the queue but I've been unable to scale beyond 128 without everything grinding to a halt.

The queue table itself fits in RAM (with 2M hugepages) and during the wait, all the performance counters drop to almost 0 - no disk read or write (semi-expected due to the table fitting in memory) with 100% buffer hit rate in pg_top and row read around 100/s which is much smaller than expected.

After processes complete the select and the number of waiting selects starts to fall, CPU load falls and then suddenly the remaining processes all complete within a few seconds and things perform normally until the next time there are a group of SELECT  FOR UPDATE statements which bunch together and things then repeat.

I found that performing extremely frequent vacuum analyze (every 30 minutes) helps a small amount but this is not that helpful so problems are still very apparent.

I've exhausted all the performance tuning and analysis results I can find that seem even a little bit relevant but cannot get this cracked.

Is anyone on the list able to help with suggestions of what I can do to track why this CPU hogging happens as this does seem to be the root of the problem?

Thanks in advance,

Jim


В списке pgsql-performance по дате отправления:

Предыдущее
От: Satyam Shekhar
Дата:
Сообщение: Replication lag due to lagging restart_lsn
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED