Обсуждение: Released updated Tcl interfaces: pgin.tcl-2.1.0, pgtclng-1.5.1

Поиск
Список
Период
Сортировка

Released updated Tcl interfaces: pgin.tcl-2.1.0, pgtclng-1.5.1

От
L J Bayuk
Дата:
Updated versions of these two implementations of the Tcl interface to
PostgreSQL were released recently:
 pgin.tcl, the "pure-TCL" implementation, at:      http://gborg.postgresql.org/project/pgintcl/    Currently available
fordownload:    + pgintcl-2.1.0.tgz
 
 pgtcl-ng, the libpq-based "next generation" implementation, at:      http://gborg.postgresql.org/project/pgtclng/
Currentlyavailable for download:    + pgtcl1.5.1.tar.gz            (Source distribution)    + pgtcl1.5.1-win32.zip
  (Binary distribution for Windows)    + pgtcldocs-20040609.zip       (A Pgtcl Reference Manual)      You can look at
themanual online, until it finds a better home, at:          http://lbayuk.home.mindspring.com/pgtcldocs/
 

Pgtclng is the "next generation" libpq-based Tcl interface to PostgreSQL.
It adds many new commands and features which take advantage of new
capabilities in the PostgreSQL-7.4 protocol and libpq client library.
Pgin.tcl is a Tcl interface to PostgreSQL which is written entirely in Tcl.
It is mostly but not entirely command-compatible with pgtclng.  Both of
these implementations are only for use with PostgreSQL 7.4 and up.

New features in these releases include: * Get command status tag with: pg_result -cmdStatus (suggested by levanj) *
Parse,bind, and execute SQL statement with: pg_exec_params,   complements pg_exec_prepared; both are binary safe. *
Extendedpg_exec to support optional parameters (idea from karl's   implementation in Gborg pgtcl CVS); this is a
text-format-onlyversion   of pg_exec_params. * Bug fix (pgin.tcl) and improvement (pgtclng) to pg_execute error
handling.


Re: Released updated Tcl interfaces: pgin.tcl-2.1.0,

От
"Nigel J. Andrews"
Дата:
Whoa, I'll be off getting that [pgin.tcl] soon, I've been playing with the
previous version a little.

However, since making some changes to my code I've not had time to do the
investigation so I just thought I'd ask if there's been any reports on the
relative speed of normal sql execution against prepared statements.

The situation is that I had a proof of concept type implementation of a db
interface in an application that used plain SQL execution with pg_execute (I
think that's the one) and taking it closer to a production level system I
restructured my code a little and switched to using prepared statements. As I
say I haven't had time to investigate if it's my code changes or the use of
prepared statements but I think my newer version runs noticibly slower. I've
got this idea in my head that it's the binary packing that's slowing things,
coupled with the fact that the statements are pretty simple insert
commands. Transactions shouldn't be an issue as at worst this newer version
uses the same number of inserts per transaction as the older version and at
best uses twice the number (basically I can't remember if the earlier version
used transactions at all).

So to summarise, I'm thinking the packing into the prepared statement execution
protocol is perhaps taking significantly longer than the db server is to parse
the plain SQL for such small, simple inserts but do not have any experimental
results to actually show that yet.

Other information: the db server runs on a separate box on a gigabit network,
the client is a TCL app with some extensions, which may have an effect on the
main TCL loop (takes 5 minutes to start with 100% (near enough) cpu usage to
load data from another remote system (probably not more than 100MB) and this
data load definitely could not be IO bound.

BTW, thanks a bunch for this interface. Because of the above extensions I
really wasn't happy about pluging in another extension for PG but knew of this
TCL only implementation so went straight for it. It proved the concept and it's
only not in a financial data system now because my interface was under-designed
for a production system and there's a perception that the PgSQL idea needs to
be proved capable of handling the load. Which is fair enough, although I do
keep saying that I see no reason for it not to handle a couple of dozen
clients logging user interactions, even running on the play linux box it's
currently running on. Trouble with that is this noticible slow down (noticed
when doing quite a few inserts in response to a single user action).

Hmmm...someone should shoot me for writing long rambling emails to these lists.

--
Nigel Andrews


On Sun, 20 Jun 2004, L J Bayuk wrote:

> Updated versions of these two implementations of the Tcl interface to
> PostgreSQL were released recently:
> 
>   pgin.tcl, the "pure-TCL" implementation, at:
>        http://gborg.postgresql.org/project/pgintcl/
>      Currently available for download:
>      + pgintcl-2.1.0.tgz
> 
>   pgtcl-ng, the libpq-based "next generation" implementation, at:
>        http://gborg.postgresql.org/project/pgtclng/
>      Currently available for download:
>      + pgtcl1.5.1.tar.gz            (Source distribution)
>      + pgtcl1.5.1-win32.zip         (Binary distribution for Windows)
>      + pgtcldocs-20040609.zip       (A Pgtcl Reference Manual)
>        You can look at the manual online, until it finds a better home, at:
>            http://lbayuk.home.mindspring.com/pgtcldocs/
> 
> Pgtclng is the "next generation" libpq-based Tcl interface to PostgreSQL.
> It adds many new commands and features which take advantage of new
> capabilities in the PostgreSQL-7.4 protocol and libpq client library.
> Pgin.tcl is a Tcl interface to PostgreSQL which is written entirely in Tcl.
> It is mostly but not entirely command-compatible with pgtclng.  Both of
> these implementations are only for use with PostgreSQL 7.4 and up.
> 
> New features in these releases include:
>   * Get command status tag with: pg_result -cmdStatus (suggested by levanj)
>   * Parse, bind, and execute SQL statement with: pg_exec_params,
>     complements pg_exec_prepared; both are binary safe.
>   * Extended pg_exec to support optional parameters (idea from karl's
>     implementation in Gborg pgtcl CVS); this is a text-format-only version
>     of pg_exec_params.
>   * Bug fix (pgin.tcl) and improvement (pgtclng) to pg_execute error handling.



Re: Released updated Tcl interfaces: pgin.tcl-2.1.0,

От
Tom Lane
Дата:
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> However, since making some changes to my code I've not had time to do the
> investigation so I just thought I'd ask if there's been any reports on the
> relative speed of normal sql execution against prepared statements.

You can lose if the prepared statements are parameterized in such a way
that the planner can't see critical information.  This has been hashed
over many a time in the pgsql-performance archives (though mostly in the
context of queries in plpgsql functions).  Try EXPLAIN EXECUTE and see
if you get a materially different plan than with the straight query.
        regards, tom lane


Re: Released updated Tcl interfaces: pgin.tcl-2.1.0,

От
"Nigel J. Andrews"
Дата:
On Tue, 22 Jun 2004, Tom Lane wrote:

> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > However, since making some changes to my code I've not had time to do the
> > investigation so I just thought I'd ask if there's been any reports on the
> > relative speed of normal sql execution against prepared statements.
> 
> You can lose if the prepared statements are parameterized in such a way
> that the planner can't see critical information.  This has been hashed
> over many a time in the pgsql-performance archives (though mostly in the
> context of queries in plpgsql functions).  Try EXPLAIN EXECUTE and see
> if you get a materially different plan than with the straight query.

Cheeers Tom, I must admit that it only fleetingly crossed my mind that it could
be a server/planning issue. However, I did go away and check; eventually. As
the two plans are boringly similar, only difference being the param vs const
parts in verbose info with both showing type 25 (text I hope), I'll not even
attach them here. Also the timings were much the same.

I did then go and do some checks running from the TCL client
app; eventually. These results I show below:


% DBCore::_Prepare contriblog aaa {text} {select current_timestamp, $1}
0
% time {DBCore::_ExecPrepared contriblog aaa hello} 100
39370 microseconds per iteration

% time {DBCore::_Execute contriblog {select current_timestamp, 'hello'}} 100
1250 microseconds per iteration

% time {pg_exec_prepared $DBCore::_contriblog_conn aaa T {T} {hello}} 100
39220 microseconds per iteration

[I used only 100 iterations since using 1000 for the prepared statements tends
to block the application main loop long enough for the other components to be
affected, which means there's a big nasty delay as they reinitialise after the
test finishes and I wanted to sure of not including that time in the test
results.]


This is not one of the queries the app. is using, surprisingly, but the results
are in line with the results from a real query. The _ExecPrepared behaves a
little like pg_execute and _Execute is just a small wrapper around
pg_execute. Just to check that the extra code in _ExecPrepared isn't
significant I ran the pg_exec_prepared directly, and although it seems it adds
200usec to the execution time it's not really significant.

Obviously the prepared query execution is an order of magnitude slower and it's
the TCL interface that's making it so.

This is still the older version of pgintcl I hasten to add, took me this long
to run these few simple tests no chance of updating the package yet! Also, it
doesn't detract from the usefulness of this pure TCL interface for me, I've
switched the code path taken back to using plain SQL via pg_execute and it's
not noticable in the application once more.


--
Nigel Andrews




Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
L J Bayuk
Дата:
On Wed, Jun 30, 2004 at 11:29:32PM +0100, Nigel J. Andrews wrote:
> ... (description of prepared query execution in pgin.tcl taking 10 times
>      longer than pg_exec doing the same query)
> Obviously the prepared query execution is an order of magnitude slower and
> it's the TCL interface that's making it so.

My own tests show equivalent performance of pg_exec and pg_exec_prepared,
but that is for medium-to-large result sets. With tiny result sets - one
row, 2 values - I also see an unexpected 40 ms delay in pg_exec_prepared
when using pgin.tcl.  (Very interesting that it's the same delay on different
hardware.)

After looking at it for a while, I know where the problem is, but not
what is causing it. I also know how to fix it, but not why the fix works.
I'm going to explain what I know, hoping someone can fill in some missing
parts.

pg_exec sends 1 message: Query.  pg_exec_prepared sends 4 messages: Bind,
DescribePortal, Execute, and Sync. After that, they are identical.

The delay we are seeing (about 40 ms) is from the time the client sends the
Bind message to PostgreSQL until it gets a TCP ACK back. (It should get
a BindComplete message back, but the backend does't flush after sending
this - no pq_flush() at the end of exec_bind_message(). Hello, Tom, is this
a bug? Changing it just moves the delay around, though.) Other messages
from the client are typically replied to and/or ACK'd in under 1 ms,
according to Ethereal.  I don't know why this message takes so much longer
to ACK, nor why the client waits for the ACK. I'm sure it isn't the
backend; I don't think it is Tcl; it could be something in the TCP stack.

The fix: pgin.tcl sets its Tcl connection socket channel to "unbuffered",
which means each PostgreSQL message it sends will be go into the TCP
buffers immediately, since each message is written in a single "puts".  And
it seems to work fine with all messages except that Bind. My fix is to
change it to use Tcl "buffering full" mode; then I have it flush after each
sent message, except in pg_exec_prepared (and pg_exec_params), where it
flushes only after sending the final Sync.  The result is to combine the 4
messages from pg_exec_prepared so they get dumped into the TCP buffers at
once. The problem goes away, but I don't know why it works, and that
bothers me. Before:   39987 microseconds per iteration - pg_exec_prepared   39992 microseconds per iteration -
pg_exec_params   4133 microseconds per iteration - pg_exec After:    4483 microseconds per iteration - pg_exec_prepared
  5214 microseconds per iteration - pg_exec_params    4150 microseconds per iteration - pg_exec
 


Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
Tom Lane
Дата:
L J Bayuk <ljb220@mindspring.com> writes:
> The delay we are seeing (about 40 ms) is from the time the client sends the
> Bind message to PostgreSQL until it gets a TCP ACK back. (It should get
> a BindComplete message back, but the backend does't flush after sending
> this - no pq_flush() at the end of exec_bind_message(). Hello, Tom, is this
> a bug?

No.  If you wanted a flush just there, send a Flush message.  But AFAICS
the only reason why you'd do that is if you needed to look at the
BindComplete before sending your next message --- which you do not.

> pgin.tcl sets its Tcl connection socket channel to "unbuffered",
> which means each PostgreSQL message it sends will be go into the TCP
> buffers immediately, since each message is written in a single "puts".

This is certainly bogus.  A good implementation would arrange to flush
the TCP outbound queue immediately after sending either a Flush or a
Sync message --- and *no place else*.  Those are exactly the points
where you are starting to wait for a backend reply.  More frequent
flushing simply results in more separate network packets, which is not
a win.  Also, if you haven't sent one of these message types, you are
not entitled to expect any immediate reply (which is essentially what's
happening with pgin.tcl, evidently).

The V3 protocol is designed around the assumption that each side will
batch multiple logical messages into single network packets whenever
possible.  If you don't do this, it's not surprising that performance
sucks :-(

The reason error recovery is defined as doing skip-to-Sync is exactly
so that you can shove off a batch of messages without having first
checked the results of the earlier messages.
        regards, tom lane


Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
Greg Stark
Дата:
L J Bayuk <ljb220@mindspring.com> writes:

> The delay we are seeing (about 40 ms) is from the time the client sends the
> Bind message to PostgreSQL until it gets a TCP ACK back.

You might consider disabling the NAGLE algorithm. It can delay acks like this.
You could do this with setsockopt(SOL_TCP, TCP_NODELAY,...)

The problem is that while this would be perfectly appropriate for things like
bind messages, it's entirely inappropriate for the bulk data transfer
of the results of a query.

I would not suggest enabling TCP_NODELAY for all data. And I'm not sure how
implementations behave if you turn it on and off.

-- 
greg



Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
Greg Stark
Дата:
L J Bayuk <ljb220@mindspring.com> writes:

> Other messages from the client are typically replied to and/or ACK'd in
> under 1 ms, according to Ethereal. I don't know why this message takes so
> much longer to ACK, nor why the client waits for the ACK. I'm sure it isn't
> the backend; I don't think it is Tcl; it could be something in the TCP
> stack.

Ahah, found a good explanation from a pretty authoritative source:

http://groups.google.com/groups?oi=djq&selm=an_477899304

Nothing like getting it straight from the horse's mouth. 

(Note for any non-native-english-speakers: that's an expression, I've never
seen Mr Nagle's mouth personally)

-- 
greg



Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
L J Bayuk
Дата:
On Mon, Jul 05, 2004 at 10:43:47PM -0400, Tom Lane wrote:
> L J Bayuk <ljb220@mindspring.com> writes:
> > The delay we are seeing (about 40 ms) is from the time the client sends the
> > Bind message to PostgreSQL until it gets a TCP ACK back. (It should get
> > a BindComplete message back, but the backend doesn't flush after sending
> > this - no pq_flush() at the end of exec_bind_message(). Hello, Tom, is this
> > a bug?
> 
> No.  If you wanted a flush just there, send a Flush message.  But AFAICS
> the only reason why you'd do that is if you needed to look at the
> BindComplete before sending your next message --- which you do not.


I agree, it works OK as is. I only raised the question because of two
things. First, the protocol documentation says: "The response (to the Bind message) is either BindComplete  or
ErrorResponse."
From which one might wrongly conclude that you get BindComplete (or
ErrorResponse) back when you send Bind. (Pgin.tcl doesn't assume this.)

Second, this comment in backend/utils/error/elog.c:  "This flush is normally not necessary, since postgres.c will flush
 out waiting data when control returns to the main loop. But it..."
 
I could not find the described flush "when control returns to the main
loop". Either I'm missing it, or it isn't there, and the comment implies
that it may have been there at one time.


> > pgin.tcl sets its Tcl connection socket channel to "unbuffered",
> > which means each PostgreSQL message it sends will be go into the TCP
> > buffers immediately, since each message is written in a single "puts".
> 
> This is certainly bogus.  A good implementation would arrange to flush
> the TCP outbound queue immediately after sending either a Flush or a
> Sync message --- and *no place else*.  Those are exactly the points
> where you are starting to wait for a backend reply.  More frequent
> flushing simply results in more separate network packets, which is not
> a win.  Also, if you haven't sent one of these message types, you are
> not entitled to expect any immediate reply (which is essentially what's
> happening with pgin.tcl, evidently).
> ...

No, I didn't explain it well.  Pgin.tcl itself isn't waiting for a reply to
Bind. It sends Bind, DescribePortal, Execute, and Sync without waiting in
between.  Currently it flushes after each, but this is Tcl flushing to TCP,
not to the wire. I'm going to change that, although in Tcl I can't force
TCP to flush its outbound queue.  But for now, I seem to end up with Bind
in one packet, and the other three messages combined into another.  That
delay is pgin.tcl blocked while the TCP stack waits for an ACK to the Bind
message's packet. I don't know why this happens but it seems to be a case
where TCP performs "less than optimally" in deciding when to send what it's
got, when to ACK, and when to wait. I will work around it by exercising
more control on when Tcl flushes the data to TCP.  I just haven't decided
whether to flush before reading, or flush after all messages that need a
response (I think: Startup, PasswordMessage, Query, Sync, CopyDone, and
FunctionCall are the ones I use).


Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
Tom Lane
Дата:
L J Bayuk <ljb220@mindspring.com> writes:
> I just haven't decided
> whether to flush before reading, or flush after all messages that need a
> response (I think: Startup, PasswordMessage, Query, Sync, CopyDone, and
> FunctionCall are the ones I use).

You seem to have quite missed my point.  If you decide to flush on the
basis of flush-after-certain-message-types-are-sent, then the types to
flush after are precisely Sync and Flush.  If you think you want
something different, you are wrong and should think again (or more
likely, insert Flush messages into the outgoing stream at the places
where you want a flush to occur).  There is no point in flushing after
any other message type because the backend won't flush its response.

Of course this is all moot if you choose to flush just before waiting
for input (though you'd still better be sure that Flush or Sync was the
last message sent before you do so).

> I could not find the described flush "when control returns to the main
> loop". Either I'm missing it, or it isn't there, and the comment implies
> that it may have been there at one time.

It's actually in ReadyForQuery() these days.
        regards, tom lane


Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
L J Bayuk
Дата:
On Fri, Jul 09, 2004 at 08:07:56PM -0400, Tom Lane wrote:
> L J Bayuk <ljb220@mindspring.com> writes:
> > I just haven't decided
> > whether to flush before reading, or flush after all messages that need a
> > response (I think: Startup, PasswordMessage, Query, Sync, CopyDone, and
> > FunctionCall are the ones I use).
> 
> You seem to have quite missed my point.  If you decide to flush on the
> basis of flush-after-certain-message-types-are-sent, then the types to
> flush after are precisely Sync and Flush.  If you think you want
> something different, you are wrong and should think again (or more
> likely, insert Flush messages into the outgoing stream at the places
> where you want a flush to occur).  There is no point in flushing after
> any other message type because the backend won't flush its response.
>...

Yes, I must be missing your point, unless you are talking only about the
Extended Query mode. Surely outside of Extended Query mode (Startup, Basic
Query mode, Function Call) the client must flush output after (at least)
the messages I listed above. That's what I get from the protocol
documentation, what works in practice, and network traces.


Re: pgin.tcl pg_exec_prepared slow (was: Released...)

От
Tom Lane
Дата:
L J Bayuk <ljb220@mindspring.com> writes:
> Yes, I must be missing your point, unless you are talking only about the
> Extended Query mode.

Sorry, you're right, that was what I was thinking about.
        regards, tom lane