This is consolidation databases, in this machine there are around 250+ wal sender processes.
top output revealed high system cpu:
%Cpu(s): 1.4 us, 49.7 sy, 0.0 ni, 48.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
profiling cpu with perf:
perf top -e cpu-clock
Events: 142K cpu-clock
82.37% [kernel] [k] __mutex_lock_common.isra.5
4.49% [kernel] [k] do_raw_spin_lock
2.23% [kernel] [k] mutex_lock
2.16% [kernel] [k] mutex_unlock
2.12% [kernel] [k] arch_local_irq_restore
1.73% postgres [.] ValidXLogRecord
0.87% [kernel] [k] __mutex_unlock_slowpath
0.78% [kernel] [k] arch_local_irq_enable
0.63% [kernel] [k] sys_recvfrom
finally get which processes (wal senders) that are using mutexes:
perf top -e task-clock -p 55382
Events: 697 task-clock
88.08% [kernel] [k] __mutex_lock_common.isra.5
3.27% [kernel] [k] do_raw_spin_lock
2.34% [kernel] [k] arch_local_irq_restore
2.10% postgres [.] ValidXLogRecord
1.87% [kernel] [k] mutex_unlock
1.87% [kernel] [k] mutex_lock
0.47% [kernel] [k] sys_recvfrom
I think bdr is only reading wal file (current state is we behind current wal lsn),