Re: pgbench - implement strict TPC-B benchmark

Поиск
Список
Период
Сортировка
От Fabien COELHO
Тема Re: pgbench - implement strict TPC-B benchmark
Дата
Msg-id alpine.DEB.2.21.1908010654590.2683@lancre
обсуждение исходный текст
Ответ на Re: pgbench - implement strict TPC-B benchmark  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pgbench - implement strict TPC-B benchmark  (Robert Haas <robertmhaas@gmail.com>)
Re: pgbench - implement strict TPC-B benchmark  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hello Tom,

> [ shrug... ]  TBH, the proposed patch does not look to me like actual
> benchmark kit; it looks like a toy.  Nobody who was intent on making their
> benchmark numbers look good would do a significant amount of work in a
> slow, ad-hoc interpreted language.  I also wonder to what extent the
> numbers would reflect pgbench itself being the bottleneck.


> Which is really the fundamental problem I've got with all the stuff 
> that's been crammed into pgbench of late --- the more computation you're 
> doing there, the less your results measure the server's capabilities 
> rather than pgbench's implementation details.

That is a very good question. It is easy to measure the overhead, for 
instance:

   sh> time pgbench -r -T 30 -M prepared
   ...
   latency average = 2.425 ms
   tps = 412.394420 (including connections establishing)
   statement latencies in milliseconds:
     0.001  \set aid random(1, 100000 * :scale)
     0.000  \set bid random(1, 1 * :scale)
     0.000  \set tid random(1, 10 * :scale)
     0.000  \set delta random(-5000, 5000)
     0.022  BEGIN;
     0.061  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
     0.038  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
     0.046  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
     0.042  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
     0.036  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta,
CURRENT_TIMESTAMP);
     2.178  END;
   real    0m30.080s, user    0m0.406s, sys     0m0.689s

The cost of pgbench interpreted part (\set) is under 1/1000. The full time 
of the process itself counts for 1.4%, below the inevitable system time 
which is 2.3%. Pgbench overheads are pretty small compared to postgres 
connection and command execution, and system time. The above used a local 
socket, if it were an actual remote network connection, the gap would be 
larger. A profile run could collect more data, but that does not seem 
necessary.

Some parts of Pgbench could be optimized, eg for expressions the large 
switch could be avoided with precomputed function call, some static 
analysis could infer some types and avoid calls to generic functions which 
have to tests types, and so on. But franckly I do not think that this is 
currently needed so I would not bother unless an actual issue is proven.

Also, pgbench overheads must be compared to an actual client application, 
which deals with a database through some language (PHP, Python, JS, Java…) 
the interpreter of which would be written in C/C++ just like pgbench, and 
some library (ORM, DBI, JDBC…), possibly written in the initial language 
and relying on libpq under the hood. Ok, there could be some JIT involved, 
but it will not change that there are costs there too, and it would have 
to do pretty much the same things that pgbench is doing, plus what the 
application has to do with the data.

All in all, pgbench overheads are small compared to postgres processing 
times and representative of a reasonably optimized client application.

> In any case, even if I were in love with the script itself,

Love is probably not required for a feature demonstration:-)

> we cannot commit something that claims to be "standard TPC-B".

Yep, I clearly underestimated this legal aspect.

> It needs weasel wording that makes it clear that it isn't TPC-B, and 
> then you have a problem of user confusion about why we have both 
> not-quite-TPC-B-1 and not-quite-TPC-B-2, and which one to use, or which 
> one was used in somebody else's tests.

I agree that confusion is no good either.

> I think if you want to show off what these pgbench features are good
> for, it'd be better to find some other example that's not in the
> midst of a legal minefield.

Yep, I got that.

To try to salvage my illustration idea: I could change the name to "demo", 
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use 
a non-uniform random generator, and then explicitly say that it is a 
vaguely inspired by "TPC-B" and intended as a demo script susceptible to 
be updated to illustrate new features (eg if using a non-uniform generator 
I'd really like to add a permutation layer if available some day).

This way, the "demo" real intention would be very clear.

-- 
Fabien.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: SimpleLruTruncate() mutual exclusion
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - implement strict TPC-B benchmark