[GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

Поиск
Список
Период
Сортировка
От Nikolai Zhubr
Тема [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).
Дата
Msg-id 588F0469.6090404@yandex.ru
обсуждение исходный текст
Ответы Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
Список pgsql-general
Hello all,

(Hopefully this is right place to post on the subject, otherwise please
let me know.)

I'm observing some strange inexplicable effect in 9.5.5 server running
on x86 windows (32-bit windows xp sp3). That is, CPU usage in backend
process for the session in question starts to grow, going from approx
0-1 to 8-15 percent and more, stays that high for several seconds, then
goes back to 0-1. All the effect takes about 15-30 seconds, and repeats
stably every 10-20 minutes (as long as the respective client continues
to run the same queries). Apparently it is essential to consider the
pattern of requests going to the server: in this case there is a
continuous stream of very small cheap queries, but quite a lot of them
per second. Trying to understand the reason, I've managed to craft a
pure artifical test triggering very similar CPU load waves without the
need for any specific database at all, easy to repeat:

1. "select localtimestamp" 40 times (As separate requests, one by one,
but no delay inserted in between)
2. wait 1/2 second.
3. goto 1

That's it. Just let it run for > 20 minutes in one session. (These
queries are so cheap that normally they consume approx zero resources)

Screenshots: https://yadi.sk/i/J_yj_0t43BgdGw
(I can also send as file if this link does not work)

Other notes:
- the server instance in question is EDB 9.5.5-1 win32 binaries.
- the production machine is Core 2 duo 2600MHz, 2Gb ram, typical CPU
load is rather low, like 0% to 3%, therefore the effect is easily noticable.
- no antivirus or other weird or abusive software involved.
- communication goes through libpq (tcp/ip only).
- turning SSL on/off does not matter.
- my test application issueing "select localtimestamp" in the endless
loop is written in pascal, although this probably does not matter.
- query execution time as seen by the client is not affected (stays low).
- pausing the test in the client causes backend to drop CPU usage
immediately to 0, resuming causes it to go back to where it was (unless
pausing for too long), so excessive CPU load is tied to some normal
activity of backend (i.e. no activity == no load).
- the effect is NOT observed (yet?) when running test on a server
machine directly (pointing it to 127.0.0.1)
- the effect looks more substantial in SMP case (2 Cores), compared to
UMP case (when testing in a VM, see below).
- nothing appears in the log.

To me it looks like some sort of wait/check/synchronize issue for a
socket/lock/signal or similar. Probably some rare corner case, probably
windows-specific. However, looking through backend/port/win32/socket.c
and backend/port/win32_latch.c I cannot immediately see anything wrong
yet (but WaitForMultipleObjects is a tricky thing IIRC).

Luckily I've managed to construct a configuration very similar to
production machine in a development VM (VirtualBox) and reproduced the
effect there, so now I'm able to safely and comfortably test whatever
ideas might appear (Although I'm somewhat reluctant to rebuild and
hand-debug the server here myself, because preparing proper build
environment on windows is quite a lot of work...)

Any help/thoughts/instructions greatly appreciated.

Should I also report this to bugtracker at postgresql.org?


Thank you,
Nikolai



В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: [GENERAL] using hstore to store documents
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: [GENERAL] Recovery Assistance