Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Дата
Msg-id 51CF6999.4060103@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
On 6/22/13 12:54 PM, Fabien COELHO wrote:
> After some poking around, and pursuing various red herrings, I resorted
> to measure the delay for calling "PQfinish()", which is really the only
> special thing going around at the end of pgbench run...

This wasn't what I was seeing, but it's related.  I've proved to myself 
the throttle change isn't reponsible for the weird stuff I'm seeing now.  I'd like to rearrange when PQfinish happens
nowbased on what I'm 
 
seeing, but that's not related to this review.

I duplicated the PQfinish problem you found too.  On my Linux system, 
calls to PQfinish are normally about 36 us long.  They will sometimes 
get lost for >15ms before they return.  That's a different problem 
though, because the ones I'm seeing on my Mac are sometimes >150ms. 
PQfinish never takes quite that long.

PQfinish doesn't pause for a long time on this platform.  But it does 
*something* that causes socket select() polling to stutter.  I have 
instrumented everything interesting in this part of the pgbench code, 
and here is the problem event.

1372531862.062236 select with no timeout sleeping=0
1372531862.109111 select returned 6 sockets latency 46875 us

Here select() is called with 0 sleeping processes, 11 that are done, and 
14 that are running.  The running ones have all sent SELECT statements 
to the server, and they are waiting for a response.  Some of them 
received some data from the server, but they haven't gotten the entire 
response back.  (The PQfinish calls could be involved in how that happened)

With that setup, select runs for 47 *ms* before it gets the next byte to 
a client.  During that time 6 clients get responses back to it, but it 
stays stuck in there for a long time anyway.  Why?  I don't know exactly 
why, but I am sure that pgbench isn't doing anything weird.  It's either 
libpq acting funny, or the OS.  When pgbench is waiting on a set of 
sockets, and none of them are returning anything, that's interesting. 
But there's nothing pgbench can do about it.

The cause/effect here is that the randomness to the throttling code 
spreads out when all the connections end a bit. There are more times 
during which you might have 20 connections finished while 5 still run.

I need to catch up with revisions done to this feature since I started 
instrumenting my copy more heavily.  I hope I can get this ready for 
commit by Monday.  I've certainly beaten on the feature for long enough now.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: New regression test time
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: New regression test time