Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Дата
Msg-id 51BB7ACF.4000300@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
On 6/14/13 3:50 PM, Fabien COELHO wrote:
> I think that the weirdness really comes from the way transactions times
> are measured, their interactions with throttling, and latent bugs in the
> code.

measurement times, no; interactions with throttling, no.  If it was 
either of those I'd have finished this off days ago.  Latent bugs, 
possibly.  We may discover there's nothing wrong with your code at the 
end here, that it just makes hitting this bug more likely. 
Unfortunately today is the day *some* bug is popping up, and I want to 
get it squashed before I'll be happy.

The lag is actually happening during a kernel call that isn't working as 
expected.  I'm not sure whether this bug was there all along if \sleep 
was used, or if it's specific to the throttle sleep.
> Also, flag st->listen is set to 1 but *never* set back to 0...

I noticed that st-listen was weird too, and that's on my short list of 
suspicious things I haven't figured out yet.

I added a bunch more logging as pgbench steps through its work to track 
down where it's stuck at.  Until the end all transactions look like this:

1371238832.084783 client 10 throttle lag 2 us
1371238832.084783 client 10 executing \setrandom aid 1 :naccounts
1371238832.084803 client 10 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 753099;
1371238832.084840 calling select
1371238832.086539 client 10 receiving
1371238832.086539 client 10 finished

All clients who hit lag spikes at the end are going through this 
sequence instead:

1371238832.085912 client 13 throttle lag 790 us
1371238832.085912 client 13 executing \setrandom aid 1 :naccounts
1371238832.085931 client 13 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 564894;
1371238832.086592 client 13 receiving
1371238832.086662 calling select
1371238832.235543 client 13 receiving
1371238832.235543 client 13 finished

Note the "calling select" here that wasn't in the normal length 
transaction before it.  The client is receiving something here, but 
rather than it finishing the transaction it falls through and ends up at 
the select() system call outside of doCustom.  All of the clients that 
are sleeping when the system slips into one of these long select() calls 
are getting stuck behind it.  I'm not 100% sure, but I think this only 
happens when all remaining clients are sleeping.

Here's another one, it hits the receive that doesn't finish the 
transaction earlier (1371238832.086587) but then falls into the same 
select() call at 1371238832.086662:

1371238832.085884 client 12 throttle lag 799 us
1371238832.085884 client 12 executing \setrandom aid 1 :naccounts
1371238832.085903 client 12 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 299080;
1371238832.086587 client 12 receiving
1371238832.086662 calling select
1371238832.231032 client 12 receiving
1371238832.231032 client 12 finished

Investigation is still going here...

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)