Re: Load Distributed Checkpoints test results

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Load Distributed Checkpoints test results
Дата
Msg-id Pine.GSO.4.64.0706201512070.2198@westnet.com
обсуждение исходный текст
Ответ на Re: Load Distributed Checkpoints test results  (Heikki Linnakangas <heikki@enterprisedb.com>)
Ответы Re: Load Distributed Checkpoints test results  (Bruce Momjian <bruce@momjian.us>)
Re: Load Distributed Checkpoints test results  (Heikki Linnakangas <heikki@enterprisedb.com>)
Список pgsql-hackers
On Wed, 20 Jun 2007, Heikki Linnakangas wrote:

> Another series with 150 warehouses is more interesting. At that # of 
> warehouses, the data disks are 100% busy according to iostat. The 90% 
> percentile response times are somewhat higher with LDC, though the 
> variability in both the baseline and LDC test runs seem to be pretty high.

Great, this the exactly the behavior I had observed and wanted someone 
else to independantly run into.  When you're in 100% disk busy land, LDC 
can shift the distribution of bad transactions around in a way that some 
people may not be happy with, and that might represent a step backward 
from the current code for them.  I hope you can understand now why I've 
been so vocal that it must be possible to pull this new behavior out so 
the current form of checkpointing is still available.

While it shows up in the 90% figure, what happens is most obvious in the 
response time distribution graphs.  Someone who is currently getting a run 
like #295 right now: http://community.enterprisedb.com/ldc/295/rt.html

Might be really unhappy if they turn on LDC expecting to smooth out 
checkpoints and get the shift of #296 instead: 
http://community.enterprisedb.com/ldc/296/rt.html

That is of course cherry-picking the most extreme examples.  But it 
illustrates my concern about the possibility for LDC making things worse 
on a really overloaded system, which is kind of counter-intuitive because 
you might expect that would be the best case for its improvements.

When I summarize the percentile behavior from your results with 150 
warehouses in a table like this:

Test    LDC %    90%
295    None    3.703
297    None    4.432
292    10    3.432
298    20    5.925
296    30    5.992
294    40    4.132

I think it does a better job of showing how LDC can shift the top 
percentile around under heavy load, even though there are runs where it's 
a clear improvement.  Since there is so much variability in results when 
you get into this territory, you really need to run a lot of these tests 
to get a feel for the spread of behavior.  I spent about a week of 
continuously running tests stalking this bugger before I felt I'd mapped 
out the boundaries with my app.  You've got your own priorities, but I'd 
suggest you try to find enough time for a more exhaustive look at this 
area before nailing down the final form for the patch.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Load Distributed Checkpoints test results
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: GUC time unit spelling a bit inconsistent