Hello,
While running some tests, I encountered a situation where pgbench gets stuck in an infinite loop, consuming 100% cpu. The setup was:
- Start postgres server from the master branch
- Initialise pgbench
- Run pgbench -c 10 -T 100
- Stop postgres with -m immediate
Now it seems that pgbench gets stuck and it's state machine does not advance. Attaching it to debugger, I saw that one of the clients remain stuck in this loop forever.
if (command->type == SQL_COMMAND)
{
if (!sendCommand(st, command))
{
/*
* Failed. Stay in CSTATE_START_COMMAND state, to
* retry. ??? What the point or retrying? Should
* rather abort?
*/
return;
}
else
st->state = CSTATE_WAIT_RESULT;
}
sendCommand() returns false because the underlying connection is bad and PQsendQuery returns 0. Reading the comment, it seems that the author thought about this situation but decided to retry instead of abort. Not sure what was the rationale for that decision, may be to deal with transient failures?
The commit that introduced this code is 12788ae49e1933f463bc. So I am copying Heikki.
Thanks,
Pavan
--