Обсуждение: performance for high-volume log insertion

От:
david@lang.hm
Дата:

I am working with the rsyslog developers to improve it's performance in
inserting log messages to databases.

currently they have a postgres interface that works like all the other
ones, where rsyslog formats an insert statement, passes that the the
interface module, that sends it to postgres (yes, each log as a seperate
transaction)

the big win is going to be in changing the core of rsyslog so that it can
process multiple messages at a time (bundling them into a single
transaction)

but then we run into confusion.

off the top of my head I know of several different ways to get the data
into postgres

1. begin; insert; insert;...;end

2. insert into table values (),(),(),()

3. copy from stdin
   (how do you tell it how many records to read from stdin, or that you
have given it everything without disconnecting)

4. copy from stdin in binary mode

and each of the options above can be done with prepared statements, stored
procedures, or functions.

I know that using procedures or functions can let you do fancy things like
inserting the row(s) into the appropriate section of a partitioned table

other than this sort of capability, what sort of differences should be
expected between the various approaches (including prepared statements vs
unprepared)

since the changes that rsyslog is making will affect all the other
database interfaces as well, any comments about big wins or things to
avoid for other databases would be appriciated.

David Lang

От:
Stephen Frost
Дата:

David,

*  () wrote:
> I am working with the rsyslog developers to improve it's performance in
> inserting log messages to databases.

Great!

> currently they have a postgres interface that works like all the other
> ones, where rsyslog formats an insert statement, passes that the the
> interface module, that sends it to postgres (yes, each log as a seperate
> transaction)

Ouch.

> the big win is going to be in changing the core of rsyslog so that it can
> process multiple messages at a time (bundling them into a single
> transaction)

Yup.

> 1. begin; insert; insert;...;end

Doing the insert in a transaction should definitely improve your
performance.  Doing them as prepared statements would be good too, and
using binary mode would very likely help.

> 2. insert into table values (),(),(),()

Using this structure would be more database agnostic, but won't perform
as well as the COPY options I don't believe.  It might be interesting to
do a large "insert into table values (),(),()" as a prepared statement,
but then you'd have to have different sizes for each different number of
items you want inserted.

> 3. copy from stdin
>   (how do you tell it how many records to read from stdin, or that you
> have given it everything without disconnecting)

Assuming you're using libpq, you just call PQputCopyEnd().  Then you
call PQgetResult() to check that everything worked ok.

> 4. copy from stdin in binary mode

Binary mode, in general, should be faster.  You should consider what
format the data is inside your application though (it's less useful to
use binary copy if you're having to convert from text to binary in your
application).

> and each of the options above can be done with prepared statements,
> stored procedures, or functions.
>
> I know that using procedures or functions can let you do fancy things
> like inserting the row(s) into the appropriate section of a partitioned
> table

We would normally recommend having the database handle the partitioning
by using a trigger on the base table to call a stored procedure.  The
application really doesn't need to know about this.

> other than this sort of capability, what sort of differences should be
> expected between the various approaches (including prepared statements vs
> unprepared)
>
> since the changes that rsyslog is making will affect all the other
> database interfaces as well, any comments about big wins or things to
> avoid for other databases would be appriciated.

Hope this helps.

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Mon, 20 Apr 2009, Stephen Frost wrote:

> David,
>
> *  () wrote:
>> I am working with the rsyslog developers to improve it's performance in
>> inserting log messages to databases.
>
> Great!
>
>> currently they have a postgres interface that works like all the other
>> ones, where rsyslog formats an insert statement, passes that the the
>> interface module, that sends it to postgres (yes, each log as a seperate
>> transaction)
>
> Ouch.

yep

>> the big win is going to be in changing the core of rsyslog so that it can
>> process multiple messages at a time (bundling them into a single
>> transaction)
>
> Yup.
>
>> 1. begin; insert; insert;...;end
>
> Doing the insert in a transaction should definitely improve your
> performance.  Doing them as prepared statements would be good too, and
> using binary mode would very likely help.
>
>> 2. insert into table values (),(),(),()
>
> Using this structure would be more database agnostic, but won't perform
> as well as the COPY options I don't believe.  It might be interesting to
> do a large "insert into table values (),(),()" as a prepared statement,
> but then you'd have to have different sizes for each different number of
> items you want inserted.

on the other hand, when you have a full queue (lots of stuff to insert) is
when you need the performance the most. if it's enough of a win on the
database side, it could be worth more effort on the applicaiton side.

>> 3. copy from stdin
>>   (how do you tell it how many records to read from stdin, or that you
>> have given it everything without disconnecting)
>
> Assuming you're using libpq, you just call PQputCopyEnd().  Then you
> call PQgetResult() to check that everything worked ok.

one of the big questions is what value we will get by making things
database specififc (more below)

>> 4. copy from stdin in binary mode
>
> Binary mode, in general, should be faster.  You should consider what
> format the data is inside your application though (it's less useful to
> use binary copy if you're having to convert from text to binary in your
> application).

any idea what sort of difference binary mode would result in?

>> and each of the options above can be done with prepared statements,
>> stored procedures, or functions.
>>
>> I know that using procedures or functions can let you do fancy things
>> like inserting the row(s) into the appropriate section of a partitioned
>> table
>
> We would normally recommend having the database handle the partitioning
> by using a trigger on the base table to call a stored procedure.  The
> application really doesn't need to know about this.

well, the trigger or stored procedure/funcion can be part of the database
config, or loaded by the app when it starts.



one very big question is how much of a gain there is in moving from a
database agnostic approach to a database specific approach.

currently rsyslog makes use of it's extensive formatting capabilities to
format a string along the lines of
$DBformat="insert into table X values ('$timestamp','$msg');"
then it hands the resulting string to the database interface module. This
is a bit of a pain to setup, and not especially efficiant, but it has the
ability to insert data into whatever schema you want to use (unlike a lot
of apps that try to force you to use their schema)

I proposed a 5 variable replacement for this to allow for N log entries to
be combined into one string to be sent to the database:

DBinit (one-time things like initialinzing prepared statements, etc)
DBstart (string for the start of a transaction)
DBjoin (tring to use to join multiple DBitems togeather)
DBend (string for the end of a transaction)
DBitem (formatting of a single action )

so you could do something like

DBstart = "insert into table X values"
DBjoin = ","
DBend = ";"
DBitem = "('$timestampe','$msg')"

and it would create a string like #2

this is extremely flexible. I think it can do everything except binary
mode operations, including copy. It is also pretty database agnostic.

but people are asking about how to do binary mode, and some were thinking
that you couldn't do prepared statements in Oracle with a string-based
interface.

so I decided to post here to try and get an idea of (1) how much
performance would be lost by sticking with strings, and (2) of all the
various ways of inserting the data, what sort of performance differences
are we talking about

David Lang

От:
Stephen Frost
Дата:

David,

*  () wrote:
> any idea what sort of difference binary mode would result in?

It depends a great deal on your application..

> currently rsyslog makes use of it's extensive formatting capabilities to
> format a string along the lines of
> $DBformat="insert into table X values ('$timestamp','$msg');"

Is this primairly the commands sent to the database?  If so, I don't
think you'll get much by going to binary-mode.  The text '$msg' isn't
going to be any different in binary.  The '$timestamp' would be, but I'm
guessing you'd have to restructure it some to match the PG binary
timestamp format and while that *would* be a win, I don't think it would
end up being all that much of a win.

> I proposed a 5 variable replacement for this to allow for N log entries
> to be combined into one string to be sent to the database:
>
> DBinit (one-time things like initialinzing prepared statements, etc)
> DBstart (string for the start of a transaction)
> DBjoin (tring to use to join multiple DBitems togeather)
> DBend (string for the end of a transaction)
> DBitem (formatting of a single action )
>
> so you could do something like
>
> DBstart = "insert into table X values"
> DBjoin = ","
> DBend = ";"
> DBitem = "('$timestampe','$msg')"
>
> and it would create a string like #2

Using this textual representation for the DBitem would cause difficulty
for any kind of prepared statement usage (Oracle or PG), and so I would
really recommend getting away from it if possible.  Instead, I would
encourage going with the PG (and Oracle, as I recall) structure of
having an array of pointers to the values.

Take a look at the documentation for PQexecParams here:
http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html

(note that you'll want to use PQprepare and PQexecPrepared in the end,
but the detailed documentation is under PQexecParams)

Basically, you would have:

DBnParams = 2;
DBparamValues[0] = ptr to $timestamp
DBparamValues[1] = ptr to $msg

If you just use the text format, you don't actually need anything else
for PG, just pass in NULL for paramTypes, paramLengths, and
paramFormats, and 0 for resultFormat.

Of course, if that's your only structure, then you can just make a C
struct that has those two pointers in it and simplify your API by
passing the struct around.

> this is extremely flexible. I think it can do everything except binary
> mode operations, including copy. It is also pretty database agnostic.

With that DBitem, I'm not sure how you would do copy easily.  You'd have
to strip out the params and possibly the comma depending on what you're
doing, and you might have to adjust your escaping (how is that done
today in $msg?).  All-in-all, not using prepared queries is just messy
and I would recommend avoiding that, regardless of anything else.

> but people are asking about how to do binary mode, and some were thinking
> that you couldn't do prepared statements in Oracle with a string-based
> interface.

Prepared statements pretty much require that you are able to pass in the
items in a non-string-based way (I don't mean to imply that you can't
use *strings*, you can, but it's 1 string per column).  Otherwise,
you've got the whole issue of figuring out where one column ends and the
next begins again, which is half the point of prepared statements.

> so I decided to post here to try and get an idea of (1) how much
> performance would be lost by sticking with strings, and (2) of all the
> various ways of inserting the data, what sort of performance differences
> are we talking about

Sticking with strings, if that's the format that's going to end up in
the database, is fine.  That's an orthogonal issue to using prepared
statements though, which you should really do.  Once you've converted to
using prepared statements of some kind, and batching together inserts in
larger transactions instead of one insert per transactions, then you can
come back to the question of passing things-which-can-be-binary as
binary (eg, timestamps, integers, floats, doubles, etc) and do some
performance testing to see how much an improvment it will get you.

    Thanks,

        Stephen

От:
Greg Smith
Дата:

On Mon, 20 Apr 2009,  wrote:

> any idea what sort of difference binary mode would result in?

The win from switching from INSERT to COPY can be pretty big, further
optimizing to BINARY you'd really need to profile to justify.  I haven't
found any significant difference in binary mode compared to overhead of
the commit itself in most cases.  The only thing I consistently run into
is that timestamps can bog things down considerably in text mode, but you
have to be pretty efficient in your app to do any better generating those
those in the PostgreSQL binary format yourself.  If you had a lot of
difficult to parse data types like that, binary might be a plus, but it
doesn't sound like that will be the case for what you're doing.

But you don't have to believe me, it's easy to generate a test case here
yourself.  Copy some typical data into the database, export it both ways:

COPY t to 'f';
COPY t to 'f' WITH BINARY;

And then compare copying them both in again with "\timing".  That should
let you definitively answer whether it's really worth the trouble.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Stephen Frost
Дата:

Greg,

* Greg Smith () wrote:
> The win from switching from INSERT to COPY can be pretty big, further
> optimizing to BINARY you'd really need to profile to justify.

Have you done any testing to compare COPY vs. INSERT using prepared
statements?  I'd be curious to know how those compare and against
multi-value INSERTS, prepared and unprepared.

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Mon, 20 Apr 2009, Stephen Frost wrote:

> David,
>
> *  () wrote:
>> any idea what sort of difference binary mode would result in?
>
> It depends a great deal on your application..
>
>> currently rsyslog makes use of it's extensive formatting capabilities to
>> format a string along the lines of
>> $DBformat="insert into table X values ('$timestamp','$msg');"
>
> Is this primairly the commands sent to the database?  If so, I don't
> think you'll get much by going to binary-mode.  The text '$msg' isn't
> going to be any different in binary.  The '$timestamp' would be, but I'm
> guessing you'd have to restructure it some to match the PG binary
> timestamp format and while that *would* be a win, I don't think it would
> end up being all that much of a win.

the applicaiton is the log server pulling apart messages, reformatting
them to whatever is appropriate for the database schema, and then
inserting them into the database (for other applications to access, ones
that rsyslog knows nothing about)

I used the example of a trivial table with timestamp and log message, but
in most cases you will break out sending host and application as well, and
in some cases may parse apart the log message itself. I have a use case
where the message itself if pipe delimited, and I will want to do make use
of the first four fields of the message (probably as seperate columns)
before dumping the rest of the message into a text field.

>> I proposed a 5 variable replacement for this to allow for N log entries
>> to be combined into one string to be sent to the database:
>>
>> DBinit (one-time things like initialinzing prepared statements, etc)
>> DBstart (string for the start of a transaction)
>> DBjoin (tring to use to join multiple DBitems togeather)
>> DBend (string for the end of a transaction)
>> DBitem (formatting of a single action )
>>
>> so you could do something like
>>
>> DBstart = "insert into table X values"
>> DBjoin = ","
>> DBend = ";"
>> DBitem = "('$timestampe','$msg')"
>>
>> and it would create a string like #2
>
> Using this textual representation for the DBitem would cause difficulty
> for any kind of prepared statement usage (Oracle or PG), and so I would
> really recommend getting away from it if possible.

that example would be, but the same mechanism would let you do


DBinit="PREPARE rsyslog_insert(date, text) AS\nINSERT INTO foo VALUES(\$1,
\$2);"
DBstart = "begini;B\n"
DBjoin = ""
DBend = "end;"
DBitem = "EXECUTE rsyslog_insert('$timestamp','$msg');\n"

which would become

PREPARE rsyslog_insert(date, text) AS
     INSERT INTO foo VALUES($1, $2);
begin;
EXECUTE rsyslog_insert('20090420-06:00', "log1");
EXECUTE rsyslog_insert('20090420-06:00', "log2");
EXECUTE rsyslog_insert('20090420-06:00', "log3");
end;

which I think makes good use of prepared statements.

> Instead, I would
> encourage going with the PG (and Oracle, as I recall) structure of
> having an array of pointers to the values.
>
> Take a look at the documentation for PQexecParams here:
> http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html
>
> (note that you'll want to use PQprepare and PQexecPrepared in the end,
> but the detailed documentation is under PQexecParams)
>
> Basically, you would have:
>
> DBnParams = 2;
> DBparamValues[0] = ptr to $timestamp
> DBparamValues[1] = ptr to $msg
>
> If you just use the text format, you don't actually need anything else
> for PG, just pass in NULL for paramTypes, paramLengths, and
> paramFormats, and 0 for resultFormat.
>
> Of course, if that's your only structure, then you can just make a C
> struct that has those two pointers in it and simplify your API by
> passing the struct around.

the database structure is not being defined by (or specificly for)
rsyslog. so at compile time we have _no_ idea how many variables of what
type there are going to be. my example of ($timestamp,$msg) was intended
to just be a sample (avoiding typing out some elaberate set of parameters)

rsyslog provides the following items, which can be sliced and diced with
substatutions, substrings, and additional inserted text.

msg     the MSG part of the message (aka "the message" ;))

rawmsg     the message excactly as it was received from the socket. Should be
useful for debugging.

uxtradmsg     will disappear soon - do NOT use!

hostname     hostname from the message

source     alias for HOSTNAME

fromhost     hostname of the system the message was received from (in a
relay chain, this is the system immediately in front of us and not
necessarily the original sender). This is a DNS-resolved name, except if
that is not possible or DNS resolution has been disabled.

fromhost-ip     The same as fromhost, but alsways as an IP address. Local
inputs (like imklog) use 127.0.0.1 in this property.

syslogtag     TAG from the message

programname     the "static" part of the tag, as defined by BSD syslogd.
For example, when TAG is "named[12345]", programname is "named".

pri     PRI part of the message - undecoded (single value)

pri-text     the PRI part of the message in a textual form (e.g.
"syslog.info")

iut     the monitorware InfoUnitType - used when talking to a MonitorWare

backend (also for phpLogCon)

syslogfacility     the facility from the message - in numerical form

syslogfacility-text     the facility from the message - in text form

syslogseverity     severity from the message - in numerical form

syslogseverity-text     severity from the message - in text form

syslogpriority     an alias for syslogseverity - included for historical
reasons (be careful: it still is the severity, not PRI!)

syslogpriority-text     an alias for syslogseverity-text

timegenerated     timestamp when the message was RECEIVED. Always in high
resolution

timereported     timestamp from the message. Resolution depends on what was
provided in the message (in most cases, only seconds)

timestamp     alias for timereported

protocol-version     The contents of the PROTCOL-VERSION field from
IETF draft draft-ietf-syslog-protcol

structured-data     The contents of the STRUCTURED-DATA field from
IETF draft draft-ietf-syslog-protocol

app-name     The contents of the APP-NAME field from IETF draft
draft-ietf-syslog-protocol

procid     The contents of the PROCID field from IETF draft
draft-ietf-syslog-protocol

msgid     The contents of the MSGID field from IETF draft
draft-ietf-syslog-protocol

inputname     The name of the input module that generated the message
(e.g. "imuxsock", "imudp"). Note that not all modules necessarily provide
this property. If not provided, it is an empty string. Also note that the
input module may provide any value of its liking. Most importantly, it is
not necessarily the module input name. Internal sources can also provide
inputnames. Currently, "rsyslogd" is defined as inputname for messages
internally generated by rsyslogd, for example startup and shutdown and
error messages. This property is considered useful when trying to filter
messages based on where they originated - e.g. locally generated messages
("rsyslogd", "imuxsock", "imklog") should go to a different place than
messages generated somewhere.

$now     The current date stamp in the format YYYY-MM-DD

$year     The current year (4-digit)

$month     The current month (2-digit)

$day     The current day of the month (2-digit)

$hour     The current hour in military (24 hour) time (2-digit)

$hhour     The current half hour we are in. From minute 0 to 29, this is
always 0 while from 30 to 59 it is always 1.

$qhour     The current quarter hour we are in. Much like $HHOUR, but values
range from 0 to 3 (for the four quater hours that are in each hour)

$minute     The current minute (2-digit)

$myhostname     The name of the current host as it knows itself (probably
useful for filtering in a generic way)

>> this is extremely flexible. I think it can do everything except binary
>> mode operations, including copy. It is also pretty database agnostic.
>
> With that DBitem, I'm not sure how you would do copy easily.  You'd have
> to strip out the params and possibly the comma depending on what you're
> doing, and you might have to adjust your escaping (how is that done
> today in $msg?).  All-in-all, not using prepared queries is just messy
> and I would recommend avoiding that, regardless of anything else.

rsyslog message formatting provides tools for doing the nessasary escaping
(and is using it for the single insert messages today)

prepared statements in text mode have similar problems (although they
_are_ better in defending against sql injection attacks, so a bit safer).
I don't see how you would easily use the API that you pointed me at above
without having to know the database layout at compile time.

>> but people are asking about how to do binary mode, and some were thinking
>> that you couldn't do prepared statements in Oracle with a string-based
>> interface.
>
> Prepared statements pretty much require that you are able to pass in the
> items in a non-string-based way (I don't mean to imply that you can't
> use *strings*, you can, but it's 1 string per column).  Otherwise,
> you've got the whole issue of figuring out where one column ends and the
> next begins again, which is half the point of prepared statements.

if you had said figuring out where the column data ends and the SQL
command begins I would have agreed with you fully.

I agree that defining a fixed table layout and compiling that knowledge
into rsyslog is the safest (and probably most efficiant) way to do things,
but there is no standard for log messages in a database, and different
people will want to do different things with the logs, so I don't see how
a fixed definition could work.

>> so I decided to post here to try and get an idea of (1) how much
>> performance would be lost by sticking with strings, and (2) of all the
>> various ways of inserting the data, what sort of performance differences
>> are we talking about
>
> Sticking with strings, if that's the format that's going to end up in
> the database, is fine.  That's an orthogonal issue to using prepared
> statements though, which you should really do.  Once you've converted to
> using prepared statements of some kind, and batching together inserts in
> larger transactions instead of one insert per transactions, then you can
> come back to the question of passing things-which-can-be-binary as
> binary (eg, timestamps, integers, floats, doubles, etc) and do some
> performance testing to see how much an improvment it will get you.

so the binary mode only makes a difference on things like timestamps and
numbers? (i.e. no significant added efficiancy in processing the command
itself?)

thanks for taking the time to answer, I was trying to keep the problem
definition small and simple, and from your reply it looks like I made it
too simple.

I think the huge complication is that when RedHat compiles rsyslog to ship
it in the distro, they have no idea how it is going to be used (if it will
go to a database, what database engine it will interface with, or what the
schema of that database would look like). Only the sysadmin(s)/dba(s) know
that and they need to be able to tell rsyslog what to do to get the data
where they want it to be, and in the format they want it to be in.

David Lang

От:
Stephen Frost
Дата:

David,

*  () wrote:
> the database structure is not being defined by (or specificly for)
> rsyslog. so at compile time we have _no_ idea how many variables of what
> type there are going to be. my example of ($timestamp,$msg) was intended
> to just be a sample (avoiding typing out some elaberate set of
> parameters)

That's fine.  I don't see any reason that the API I suggested be a
compile-time option.  Certainly, libpq has no idea how many, or what
kind, of types are being passed to it by the caller, that's why it *has*
the API that it does.  You just need to work out what each prepared
queries' parameters are and construct the necessary arrays.

> rsyslog provides the following items, which can be sliced and diced with
> substatutions, substrings, and additional inserted text.

[...]

Looks like mainly text fields still, so you might want to just stick
with text for now.

> rsyslog message formatting provides tools for doing the nessasary
> escaping (and is using it for the single insert messages today)
>
> prepared statements in text mode have similar problems (although they
> _are_ better in defending against sql injection attacks, so a bit safer).

Uhh, if you use prepared statements with PQexecPrepared, there *is no*
escaping necessary.  I'm not sure what you mean by 'similar problems'.
Can you elaborate on that?  If you mean doing 'prepared queries' by
using creating a string and then using PQexec with
'EXECUTE blah (1,2,3);' then you really need to go read the
documentation I suggested.  That's *not* what I'm getting at when I say
'prepared queries', I'm talking about a protocol-level well-defined
format for passing arguments independently of commands.  A call to
PQexecPrepared looks like this:

PQprepare(conn, "myquery", "INSERT INTO TAB1 VALUES ($1, $2);", 0, NULL);

values[0] = "a";
values[1] = "b";
PQexecPrepared(conn, "myquery", 2, values, NULL, NULL, 0);

Note that we don't ever send an 'EXECUTE myquery (1,2,3);' type of thing
to libpq.  libpq will handle the execute and the parameters and whatnot
as part of the PG 3.0 protocol.

> I don't see how you would easily use the API that you pointed me at above
> without having to know the database layout at compile time.

The arrays don't have to be of fixed length..  You can malloc() them at
runtime based on the number of parameters which are being used in a
particular message.  Perhaps what I'm missing here is exactly what
you're expecting the user to provide you with versus what you're going
to be giving to libpq.  I have been assuming that you have a format
definition system already in place that looks something like:

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

Which you then parse, figure out what the escape codes mean, and build
some structure which understands the whole thing at an abstract level.
For example, you should know that '%Y' is the first variable, is an
integer, etc.

From this, you can determine that there are 6 parameters, at runtime.
You can then malloc() an array with 6 char* pointers.  Then, when you
have those 6 strings somewhere, you just set each of your array
parameters to the appropriate in-memory string address, eg:

array = malloc(sizeof(char*) * num_of_params);
for (int i = 0; i < num_of_params; i++) {
    switch(params[i]) {
        case 'Y':    array[i] = my_year_string; break;
        case 'm':    array[i] = my_month_string; break;
    }
}

etc, until you eventually have your array of pointers, with a valid
in-memory string somewhere for each pointer, that you can then turn
around and pass to PQexecPrepared.  Obviously, you don't have to
malloc() every time, if you keep track of each type of message.

> I agree that defining a fixed table layout and compiling that knowledge
> into rsyslog is the safest (and probably most efficiant) way to do
> things, but there is no standard for log messages in a database, and
> different people will want to do different things with the logs, so I
> don't see how a fixed definition could work.

I didn't intend to imply that you have to use a fixed definition, just
that if you currently only have 1 then you might as well.  It's entirely
possible to support any definition using the API I suggested.

> so the binary mode only makes a difference on things like timestamps and
> numbers? (i.e. no significant added efficiancy in processing the command
> itself?)

I'm slightly confused by what you mean by this?  Binary mode is for
parameters only, commands are never 'binary'.  Binary mode just means
that the application gives the value to the database in the format which
it expects, and so the database doesn't have to translate a textual
representation of a value into the binary format the database needs.

> thanks for taking the time to answer, I was trying to keep the problem
> definition small and simple, and from your reply it looks like I made it
> too simple.

Yes, probably.  You likely assumed that I knew something about how
rsyslog works with databases, and to be honest, I have no idea. :)

> I think the huge complication is that when RedHat compiles rsyslog to
> ship it in the distro, they have no idea how it is going to be used (if
> it will go to a database, what database engine it will interface with, or
> what the schema of that database would look like). Only the
> sysadmin(s)/dba(s) know that and they need to be able to tell rsyslog
> what to do to get the data where they want it to be, and in the format
> they want it to be in.

That's really all fine.  You just need to get from the user, at runtime,
what they want their commands to look like.  Once you have that, it
should be entirely possible to dynamically construct the prepared
queries, most likely without the user having to know anything about COPY
or prepared statements or anything.  For my part, I'd want something
like:

table = "mytable";
data = "$Y, $m, $d, $H, $msg";

I'd avoid having the user provide actual SQL, because that becomes
difficult to deal with unless you embed an SQL parser in rsyslog, and
I don't really see the value in that.  If the user wants to do
something fancy with the data in the database, I would encourage them
to put an 'ON INSERT' trigger on 'mytable' to do whatever they want with
the data that's coming in.  This gives you the freedom necessary to
build an appropriate statement for any database you're connecting to,
dynamically, using prepared queries, and even binary mode if you want.

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> David,
>
> *  () wrote:
>> the database structure is not being defined by (or specificly for)
>> rsyslog. so at compile time we have _no_ idea how many variables of what
>> type there are going to be. my example of ($timestamp,$msg) was intended
>> to just be a sample (avoiding typing out some elaberate set of
>> parameters)
>
> That's fine.  I don't see any reason that the API I suggested be a
> compile-time option.  Certainly, libpq has no idea how many, or what
> kind, of types are being passed to it by the caller, that's why it *has*
> the API that it does.  You just need to work out what each prepared
> queries' parameters are and construct the necessary arrays.

I misunderstood what you were saying (more below)

>> rsyslog provides the following items, which can be sliced and diced with
>> substatutions, substrings, and additional inserted text.
>
> [...]
>
> Looks like mainly text fields still, so you might want to just stick
> with text for now.

yes, almost exclusivly text fields, and the fields that could be numbers
(or dates) are in text formats when we get them anyway.

>> rsyslog message formatting provides tools for doing the nessasary
>> escaping (and is using it for the single insert messages today)
>>
>> prepared statements in text mode have similar problems (although they
>> _are_ better in defending against sql injection attacks, so a bit safer).
>
> Uhh, if you use prepared statements with PQexecPrepared, there *is no*
> escaping necessary.  I'm not sure what you mean by 'similar problems'.
> Can you elaborate on that?  If you mean doing 'prepared queries' by
> using creating a string and then using PQexec with
> 'EXECUTE blah (1,2,3);' then you really need to go read the
> documentation I suggested.  That's *not* what I'm getting at when I say
> 'prepared queries', I'm talking about a protocol-level well-defined
> format for passing arguments independently of commands.  A call to
> PQexecPrepared looks like this:
>
> PQprepare(conn, "myquery", "INSERT INTO TAB1 VALUES ($1, $2);", 0, NULL);
>
> values[0] = "a";
> values[1] = "b";
> PQexecPrepared(conn, "myquery", 2, values, NULL, NULL, 0);
>
> Note that we don't ever send an 'EXECUTE myquery (1,2,3);' type of thing
> to libpq.  libpq will handle the execute and the parameters and whatnot
> as part of the PG 3.0 protocol.

when you said to stick with text mode, I thought you were meaning that we
would create a string with EXECUTE.... in it and send that. it would have
similar escaping issues (although with fewer vunerabilities if they mess
up)


>> so the binary mode only makes a difference on things like timestamps and
>> numbers? (i.e. no significant added efficiancy in processing the command
>> itself?)
>
> I'm slightly confused by what you mean by this?  Binary mode is for
> parameters only, commands are never 'binary'.  Binary mode just means
> that the application gives the value to the database in the format which
> it expects, and so the database doesn't have to translate a textual
> representation of a value into the binary format the database needs.

I thought that part of the 'efficiancy' and 'performance' to be gained
from binary modes were avoiding the need to parse commands, if it's only
the savings in converting column contents from text to specific types,
it's much less important.

>> I think the huge complication is that when RedHat compiles rsyslog to
>> ship it in the distro, they have no idea how it is going to be used (if
>> it will go to a database, what database engine it will interface with, or
>> what the schema of that database would look like). Only the
>> sysadmin(s)/dba(s) know that and they need to be able to tell rsyslog
>> what to do to get the data where they want it to be, and in the format
>> they want it to be in.
>
> That's really all fine.  You just need to get from the user, at runtime,
> what they want their commands to look like.  Once you have that, it
> should be entirely possible to dynamically construct the prepared
> queries, most likely without the user having to know anything about COPY
> or prepared statements or anything.  For my part, I'd want something
> like:
>
> table = "mytable";
> data = "$Y, $m, $d, $H, $msg";

if the user creates the data this way, you just reintroduced the escaping
problem. they would have to do something like

data = "$Y"
data = "$m"
data = "$d"
data = "$H"
data = "$msg"

one key thing is that it's very probable that the user will want to
manipulate the string, not just send a single variable as-is

> I'd avoid having the user provide actual SQL, because that becomes
> difficult to deal with unless you embed an SQL parser in rsyslog, and
> I don't really see the value in that.

there's no need for rsyslog to parse the SQL, just to be able to escape it
appropriately and then pass it to the database for execution


> If the user wants to do
> something fancy with the data in the database, I would encourage them
> to put an 'ON INSERT' trigger on 'mytable' to do whatever they want with
> the data that's coming in.  This gives you the freedom necessary to
> build an appropriate statement for any database you're connecting to,
> dynamically, using prepared queries, and even binary mode if you want.



one huge advantage of putting the sql into the configuration is the
ability to work around other users of the database.

for example, what if the database has additional columns that you don't
want to touch (say an item# sequence), if the SQL is in the config this is
easy to work around, if it's seperate (or created by the module), this is
very hard to do.

I guess you could give examples of the SQL in the documentation for how to
create the prepared statement etc in the databases, but how is that much
better than having it in the config file?

for many users it's easier to do middlein -fancy stuff in the SQL than
loading things into the database (can you pre-load prepared statements in
the database? or are they a per-connection thing?)


so back to the main questions of the advantages

prepared statements avoid needing to escape things, but at the
complication of a more complex API.

there's still the question of the performance difference. I have been
thinking that the overhead of doing the work itself would overwelm the
performance benifits of prepared statements.

as I understand it, the primary performance benifit is the ability to
avoid the parsing and planning stages of the command. for simple commands
(which I assume inserts to be, even if inserting a lot of stuff), the
planning would seem to be cheap compared to the work of doing the inserts

on a fully tuned database are we talking about 10% performance? 1%? 0.01%?

any ideas

От:
david@lang.hm
Дата:

On Mon, 20 Apr 2009, Greg Smith wrote:

> On Mon, 20 Apr 2009,  wrote:
>
>> any idea what sort of difference binary mode would result in?
>
> The win from switching from INSERT to COPY can be pretty big, further
> optimizing to BINARY you'd really need to profile to justify.  I haven't
> found any significant difference in binary mode compared to overhead of the
> commit itself in most cases.  The only thing I consistently run into is that
> timestamps can bog things down considerably in text mode, but you have to be
> pretty efficient in your app to do any better generating those those in the
> PostgreSQL binary format yourself.  If you had a lot of difficult to parse
> data types like that, binary might be a plus, but it doesn't sound like that
> will be the case for what you're doing.
>
> But you don't have to believe me, it's easy to generate a test case here
> yourself.  Copy some typical data into the database, export it both ways:
>
> COPY t to 'f';
> COPY t to 'f' WITH BINARY;
>
> And then compare copying them both in again with "\timing".  That should let
> you definitively answer whether it's really worth the trouble.

while I fully understand the 'benchmark your situation' need, this isn't
that simple.

in this case we are trying to decide what API/interface to use in a
infrastructure tool that will be distributed in common distros (it's now
the default syslog package of debian and fedora), there are so many
variables in hardware, software, and load that trying to benchmark it
becomes effectivly impossible.

based on Stephan's explination of where binary could help, I think the
easy answer is to decide not to bother with it (the postgres text to X
converters get far more optimization attention than anything rsyslog could
deploy)

David Lang

От:
david@lang.hm
Дата:

On Mon, 20 Apr 2009, Stephen Frost wrote:

> Greg,
>
> * Greg Smith () wrote:
>> The win from switching from INSERT to COPY can be pretty big, further
>> optimizing to BINARY you'd really need to profile to justify.
>
> Have you done any testing to compare COPY vs. INSERT using prepared
> statements?  I'd be curious to know how those compare and against
> multi-value INSERTS, prepared and unprepared.

and this is the rest of the question that I was trying (unsucessfully) to
ask.

is this as simple as creating a database and doing an explain on each of
these? or do I need to actually measure the time (at which point the
specific hardware and tuning settings become an issue again)

David Lang

От:
Stephen Frost
Дата:

David,

*  () wrote:
> I thought that part of the 'efficiancy' and 'performance' to be gained
> from binary modes were avoiding the need to parse commands, if it's only
> the savings in converting column contents from text to specific types,
> it's much less important.

No, binary mode is about the column contents.  Prepared queries is about
avoiding having to parse commands (which is why the command and the data
elements are seperately done).

>> table = "mytable";
>> data = "$Y, $m, $d, $H, $msg";
>
> if the user creates the data this way, you just reintroduced the escaping
> problem. they would have to do something like
>
> data = "$Y"
> data = "$m"
> data = "$d"
> data = "$H"
> data = "$msg"

Yes, there is a bit of escaping that the admins will have to deal with
in the config file.  There's no way around that though, regardless of
what you do.  If you let them put SQL in, then they may have to escape
stuff there too.  In the end, an escape problem in the config file is
something which you should really catch when you read the file in, and
even if you don't catch it there, it's less problematic, not really a
performance problem, and much less of a security risk, than having the
escaping done on data from an untrusted source.

> one key thing is that it's very probable that the user will want to
> manipulate the string, not just send a single variable as-is

You could probably let them do some manipulation, add extra
non-escape-code fields, maybe tack something on the beginning and end,
basically, anything that can be done in the application prior to it
hitting the database should be fine.

>> I'd avoid having the user provide actual SQL, because that becomes
>> difficult to deal with unless you embed an SQL parser in rsyslog, and
>> I don't really see the value in that.
>
> there's no need for rsyslog to parse the SQL, just to be able to escape
> it appropriately and then pass it to the database for execution

If the user is providing SQL, then you need to be able to parse that SQL
if you're going to do prepared queries.  It might not require you to be
able to fully parse SQL the way the back-end does, but anything you do
that's not a full SQL parser is going to end up having limitations that
won't be easy to find or document.

For example, you could ask users to provide the prepared statement the
way the database wants it, and then list the data elements seperately
from it somehow, eg:

myquery = "INSERT INTO blah (col1, col2, col3) VALUES ($1, $2, $3);"
myvals[1] = "$Y"
myvals[2] = "$M"
myvals[3] = "$msg"

The user could then do:

myquery = "INSERT INTO blah (col1, col2) SELECT substring($1), $2;"
myvals[1] = "$M"
myvals[2] = "$msg"

Both of these will work just fine as prepared queries.

You can then parse that string by just looking for the $'s, but of
course, if the user wants to put an *actual* dollar sign in, then you
have to support that somehow ($$?).  Then you have to deal with whatever
other quoting requirements you have in your config file (how do you deal
with double quotes?  What about single quotes? etc, etc).

You could possibly even put your escape codes into myquery and just try
to figure out how to do the substitutions with the $NUMs and build your
prepared query string.  It gets uglier and uglier if you ask me though.

In the end, I'd still worry about users coming up with new and different
ways to break your sql 'parser'.

> one huge advantage of putting the sql into the configuration is the
> ability to work around other users of the database.

See, I just don't see that.

> for example, what if the database has additional columns that you don't
> want to touch (say an item# sequence), if the SQL is in the config this
> is easy to work around, if it's seperate (or created by the module), this
> is very hard to do.

You can do that with a trigger trivially..  That could also be supported
through other mechanisms (for instance, let the user provide a list of
columns to fill with DEFAULT in the prepared query).

> I guess you could give examples of the SQL in the documentation for how
> to create the prepared statement etc in the databases, but how is that
> much better than having it in the config file?
>
> for many users it's easier to do middlein -fancy stuff in the SQL than
> loading things into the database (can you pre-load prepared statements in
> the database? or are they a per-connection thing?)

Prepared statements, at least under PG, are a per-connection thing.
Triggers aren't the same, those are attached to tables and get called
whenever a particular action is done on those tables (defined in the
trigger definition).  The trigger is then called with the row which is
being inserted, etc, and can do whatever it wants with that row (put it
in a different table, ignore it, etc).

> so back to the main questions of the advantages
>
> prepared statements avoid needing to escape things, but at the
> complication of a more complex API.
>
> there's still the question of the performance difference. I have been
> thinking that the overhead of doing the work itself would overwelm the
> performance benifits of prepared statements.

What work is it that you're referring to here?  Based on what you've
said about your application so far, I would expect that the run-time
cost to prepare the statement (which you do only once) to be a bit of a
cost, but not much, and that the actual inserts would be almost free
from the application side, and much easier for the database to
parse/use.

> as I understand it, the primary performance benifit is the ability to
> avoid the parsing and planning stages of the command. for simple commands
> (which I assume inserts to be, even if inserting a lot of stuff), the
> planning would seem to be cheap compared to the work of doing the inserts

The planning isn't very expensive, no, but it's also not free.  The
parsing is more expensive.

> on a fully tuned database are we talking about 10% performance? 1%? 0.01%?
>
> any ideas

It depends a great deal on your application..  Do you have some example
data that we could use to test with?  Some default templates that you
think most people end up using which we could create dummy data for?

Of course, in the end, batching your inserts into fewer transactions
than 1-per-insert should give you a huge benefit right off the bat.
What we're talking about is really the steps after that, which might not
ever be necessary for your particular application.  On the other hand,
I'd never let production code go out that isn't using prepared queries
wherever possible.

:)

    Thanks,

        Stephen

От:
Stephen Frost
Дата:

*  () wrote:
> while I fully understand the 'benchmark your situation' need, this isn't
> that simple.

It really is.  You know your application, you know it's primary use
cases, and probably have some data to play with.  You're certainly in a
much better situation to at least *try* and benchmark it than we are.

> in this case we are trying to decide what API/interface to use in a
> infrastructure tool that will be distributed in common distros (it's now
> the default syslog package of debian and fedora), there are so many
> variables in hardware, software, and load that trying to benchmark it
> becomes effectivly impossible.

You don't need to know how it will perform in every situation.  The main
question you have is if using prepared queries is faster or not, so pick
a common structure, create a table, get some data, and test.  I can say
that prepared queries will be more likely to give you a performance
boost with wider tables (more columns).

> based on Stephan's explination of where binary could help, I think the
> easy answer is to decide not to bother with it (the postgres text to X
> converters get far more optimization attention than anything rsyslog
> could deploy)

While that's true, there's no substitute for not having to do a
conversion at all.  After all, it's alot cheaper to do a bit of
byte-swapping on an integer value that's already an integer in memory
than to sprintf and atoi it.

    Thanks,

        Stephen

От:
Stephen Frost
Дата:

David,

*  () wrote:
> is this as simple as creating a database and doing an explain on each of
> these? or do I need to actually measure the time (at which point the
> specific hardware and tuning settings become an issue again)

No, you need to measure the time.  An explain isn't going to tell you
much.  However, I think the point here is that if you see a 10%
performance improvment on some given hardware for a particular test,
then chances are pretty good most people will see a performance
benefit.  Some more, some less, but it's unlikely anyone will have worse
performance for it.  There are some edge cases where a prepared
statement can reduce performance, but that's almost always on SELECT
queries, I can't think of a reason off-hand why it'd ever be slower for
INSERTs unless you're already doing things you shouldn't be if you care
about performance (like doing a join against some other table with each
insert..).

Additionally, there's really no way for us to know what an acceptable
performance improvment is for you to justify the added code maintenance
and whatnot for your project.  If you're really just looking for the
low-hanging fruit, then batch your inserts into transactions and go from
there.

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> *  () wrote:
>> while I fully understand the 'benchmark your situation' need, this isn't
>> that simple.
>
> It really is.  You know your application, you know it's primary use
> cases, and probably have some data to play with.  You're certainly in a
> much better situation to at least *try* and benchmark it than we are.

rsyslog is a syslog server. it replaces (or for debian and fedora, has
replaced) your standard syslog daemon. it recieves log messages from every
app on your system (and possibly others), filters, maniulates them, and
then stores them somewhere. among the places that it can store the logs
are database servers (native support for MySQL, PostgreSQL, and Oracle.
plus libdbi for others)

other apps then search and report on the data after it is stored. what
apps?, I don't know either. pick your favorite reporting tool and you'll
be a step ahead of me (I don't know a really good reporting tool)

as for sample data, you have syslog messages, just like I do. so you have
the same access to data that I have.

how would you want to query them? how would people far less experianced
that you want to query them?


I can speculate that some people would do two columns (time, everything
else), others will do three (time, server, everything else), and others
will go further (I know some who would like to extract IP addresses
embedded in a message into their own column). some people will index on
the time and host, others will want to do full-text searches of
everything.


I can talk about the particular use case I have at work, but that would be
far from normal (full text searches on 10s of TB of data, plus reports,
etc) but we don't (currently) use postgres to do that, and I'm not sure
how I would configure postgres for that sort of load. so I don't think
that my personal situation is a good fit. I looked at bizgres a few years
ago, but I didn't know enough about what I was trying to do or how much
data I was trying to deal with to go forward with it at that time.


do I do the benchmark on the type of hardware that I use for the system
above (after spending how much time experimenting to find corret tuning)?
or on a stock, untuned postgres running on a desktop-type system (we all
know how horrible the defaults are), how am I supposed to know if the
differences that I will see in my 'benchmarks' are the result of the
differences between the commands, and not that I missed some critical knob
to turn?

benchmarking is absolutly the right answer for some cases, especially when
someone is asking exactly how something will work for them. but in this
case I don't have the specific use case. I am trying to find out where the
throretical advantages are for these things that 'everybody knows you
should do with a database' to understand the probability that they will
make a difference in this case.

>> in this case we are trying to decide what API/interface to use in a
>> infrastructure tool that will be distributed in common distros (it's now
>> the default syslog package of debian and fedora), there are so many
>> variables in hardware, software, and load that trying to benchmark it
>> becomes effectivly impossible.
>
> You don't need to know how it will perform in every situation.  The main
> question you have is if using prepared queries is faster or not, so pick
> a common structure, create a table, get some data, and test.  I can say
> that prepared queries will be more likely to give you a performance
> boost with wider tables (more columns).

this is very helpful, I can't say what the schema would look like, but I
would guess that it will tend towards the narrow side (or at least not
update very many columns explicitly)

>> based on Stephan's explination of where binary could help, I think the
>> easy answer is to decide not to bother with it (the postgres text to X
>> converters get far more optimization attention than anything rsyslog
>> could deploy)
>
> While that's true, there's no substitute for not having to do a
> conversion at all.  After all, it's alot cheaper to do a bit of
> byte-swapping on an integer value that's already an integer in memory
> than to sprintf and atoi it.

but it's not a integer in memory, it's text that arrived over the network
or through a socket as a log message from another application.

David Lang

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> David,
>
> *  () wrote:
>> I thought that part of the 'efficiancy' and 'performance' to be gained
>> from binary modes were avoiding the need to parse commands, if it's only
>> the savings in converting column contents from text to specific types,
>> it's much less important.
>
> No, binary mode is about the column contents.  Prepared queries is about
> avoiding having to parse commands (which is why the command and the data
> elements are seperately done).

thanks for the clarification.

>>> I'd avoid having the user provide actual SQL, because that becomes
>>> difficult to deal with unless you embed an SQL parser in rsyslog, and
>>> I don't really see the value in that.
>>
>> there's no need for rsyslog to parse the SQL, just to be able to escape
>> it appropriately and then pass it to the database for execution
>
> If the user is providing SQL, then you need to be able to parse that SQL
> if you're going to do prepared queries.  It might not require you to be
> able to fully parse SQL the way the back-end does, but anything you do
> that's not a full SQL parser is going to end up having limitations that
> won't be easy to find or document.

the current situation is that rsyslog never parses the SQL (other than as
text for a template, just like if you were going to write the log message
to disk)

if we stick with the string based API we never need to

the user gives us one string 'prepare...' that we send to the database.
the user then gives us another string 'execute...' that we send to the
database. at no point do we ever need to parse the SQL, or even really
know that it is SQL (the one exception is an escapeing routine that
replace ' with '' in the strings comeing from the outside world), it's
just strings assembled using the same string assembly logic that is used
for writing files to disk, crafting the payload of network packets to
other servers, etc.

I do agree that there is a reduction in security risk. but since rsyslog
is rather draconian about forcing the escaping, I'm not sure this is
enough to tip the scales.


>> one huge advantage of putting the sql into the configuration is the
>> ability to work around other users of the database.
>
> See, I just don't see that.

moving a bit away from the traditional syslog use case for a moment. with
the ability to accept messages from many different types of sources (some
unreliable like UDP syslog, others very reliably with full
application-level acknowledgements), the ability to filter messages to
different destination, and the ability to configure it to craft arbatrary
SQL statements, rsyslog can be useful as an 'impedance match' between
different applications. you can coherse just about any app to write some
sort of message to a file/pipe, and rsyslog can take that and get it into
a database elsewhere. yes, you or I could write a quick program that would
reformat the message and submit it (in perl/python/etc, but extending that
to handle outages, high-volume bursts of traffic, etc starts to be hard.

this is very much _not_ a common use case, but it's a useful side-use of
rsyslog today.

>> I guess you could give examples of the SQL in the documentation for how
>> to create the prepared statement etc in the databases, but how is that
>> much better than having it in the config file?
>>
>> for many users it's easier to do middlein -fancy stuff in the SQL than
>> loading things into the database (can you pre-load prepared statements in
>> the database? or are they a per-connection thing?)
>
> Prepared statements, at least under PG, are a per-connection thing.
> Triggers aren't the same, those are attached to tables and get called
> whenever a particular action is done on those tables (defined in the
> trigger definition).  The trigger is then called with the row which is
> being inserted, etc, and can do whatever it wants with that row (put it
> in a different table, ignore it, etc).

that sounds like a lot of work at the database level to avoid some
complexity on the app side (and it seems that the need to fire a trigger
probably cost more than the prepared statement ever hoped to gain.)

>> so back to the main questions of the advantages
>>
>> prepared statements avoid needing to escape things, but at the
>> complication of a more complex API.
>>
>> there's still the question of the performance difference. I have been
>> thinking that the overhead of doing the work itself would overwelm the
>> performance benifits of prepared statements.
>
> What work is it that you're referring to here?

doing the inserts themselves (putting the data in the tables, updating
indexes, issuing a fsync)

> Based on what you've
> said about your application so far, I would expect that the run-time
> cost to prepare the statement (which you do only once) to be a bit of a
> cost, but not much, and that the actual inserts would be almost free
> from the application side, and much easier for the database to
> parse/use.

the inserts are far from free ;-)

but I agree that with prepared statements, the overhead of the insert is
small. I'm trying to get a guess as to how small.

>> as I understand it, the primary performance benifit is the ability to
>> avoid the parsing and planning stages of the command. for simple commands
>> (which I assume inserts to be, even if inserting a lot of stuff), the
>> planning would seem to be cheap compared to the work of doing the inserts
>
> The planning isn't very expensive, no, but it's also not free.  The
> parsing is more expensive.

ok, that makes the multi-value insert process (or copy) sound more
attractive (less parsing)

>> on a fully tuned database are we talking about 10% performance? 1%? 0.01%?
>>
>> any ideas
>
> It depends a great deal on your application..  Do you have some example
> data that we could use to test with?  Some default templates that you
> think most people end up using which we could create dummy data for?

take the contents of /var/log/messages on your system, split it into
timestamp, server, log and you have at least a reasonable case to work
with.

> Of course, in the end, batching your inserts into fewer transactions
> than 1-per-insert should give you a huge benefit right off the bat.

agreed, that was my expectation, but then people started saying how
important it was to use prepared statements.

> What we're talking about is really the steps after that, which might not
> ever be necessary for your particular application.  On the other hand,
> I'd never let production code go out that isn't using prepared queries
> wherever possible.

I do understand the sentiment, really I do. I'm just trying to understand
the why.

in this case, moving from being able to insert one record per transaction
to inserting several hundred per transaction will _probably_ swamp
everything else. but I really hate the word 'probably', I've been
surprised to many times.

one of the more recent surprises was on my multi-TB (4-core, 32G ram, 16
spindles of data) log query server I discovered that there was no
noticable difference between raid 10 and raid 6 for the bulk of my query
workload (an example is finding 14 log entries out of 12B). it turns out
that if you are read-only with a seek-heavy workload you can keep all
spindles busy with either approach. so in spite of the fact that
'everybody knows' that raid 10 is _far_ better than raid 6, especially for
databases, I discovered that that really only applies to writes.


so I really do want to understand the 'why' if possible.

you have helped a lot.

David Lang

От:
Richard Huxton
Дата:

 wrote:
> On Tue, 21 Apr 2009, Stephen Frost wrote:
>
>> *  () wrote:
>>> while I fully understand the 'benchmark your situation' need, this isn't
>>> that simple.
>>
>> It really is.  You know your application, you know it's primary use
>> cases, and probably have some data to play with.  You're certainly in a
>> much better situation to at least *try* and benchmark it than we are.
>
> rsyslog is a syslog server. it replaces (or for debian and fedora, has
> replaced) your standard syslog daemon. it recieves log messages from
> every app on your system (and possibly others), filters, maniulates
> them, and then stores them somewhere. among the places that it can store
> the logs are database servers (native support for MySQL, PostgreSQL, and
> Oracle. plus libdbi for others)

Well, from a performance standpoint the obvious things to do are:
1. Keep a connection open, do NOT reconnect for each log-statement
2. Batch log statements together where possible
3. Use prepared statements
4. Partition the tables by day/week/month/year (configurable I suppose)

The first two are vital, the third takes you a step further. The fourth
is a long-term admin thing.

And possibly
5. Have two connections, one for fatal/error etc and one for info/debug
level log statements (configurable split?). Then you can use the
synchronous_commit setting on the less important ones. Might buy you
some performance on a busy system.

http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

> other apps then search and report on the data after it is stored. what
> apps?, I don't know either. pick your favorite reporting tool and you'll
> be a step ahead of me (I don't know a really good reporting tool)
>
> as for sample data, you have syslog messages, just like I do. so you
> have the same access to data that I have.
>
> how would you want to query them? how would people far less experianced
> that you want to query them?
>
> I can speculate that some people would do two columns (time, everything
> else), others will do three (time, server, everything else), and others
> will go further (I know some who would like to extract IP addresses
> embedded in a message into their own column). some people will index on
> the time and host, others will want to do full-text searches of everything.

Well, assuming it looks much like traditional syslog, I would do
something like: (timestamp, host, facility, priority, message). It's
easy enough to stitch back together if people want that.

PostgreSQL's full-text indexing is quite well suited to logfiles I'd
have thought, since it knows about filenames, urls etc already.

If you want to get fancy, add a msg_type column and one subsidiary table
for each msg_type. So - you might have smtp_connect_from (hostname,
ip_addr). A set of perl regexps can match and extract the fields for
these extra tables, or you could do it with triggers inside the
database. I think it makes sense to do it in the application. Easier for
users to contribute new patterns/extractions. Meanwhile, the core table
is untouched so you don't *need* to know about these extra tables.

If you have subsidiary tables, you'll want to partition those too and
perhaps stick them in their own schema (logs200901, logs200902 etc).

--
   Richard Huxton
   Archonet Ltd

От:
Kenneth Marshall
Дата:

Hi,

I just finished reading this thread. We are currently working on
setting up a central log system using rsyslog and PostgreSQL. It
works well once we patched the memory leak. We also looked at what
could be done to improve the efficiency of the DB interface. On the
rsyslog side, moving to prepared queries allows you to remove the
escaping that needs to be done currently before attempting to
insert the data into the SQL backend as well as removing the parsing
and planning time from the insert. This is a big win for high insert
rates, which is what we are talking about. The escaping process is
also a big CPU user in rsyslog which then hands the escaped string
to the backend which then has to undo everything that had been done
and parse/plan the resulting query. This can use a surprising amount
of additional CPU. Even if you cannot support a general prepared
query interface, by specifying what the query should look like you
can handle much of the low-hanging fruit query-wise.

We are currently using a date based trigger to use a new partition
each day and keep 2 months of logs currently. This can be usefully
managed on the backend database, but if rsyslog supported changing
the insert to the new table on a time basis, the CPU used by the
trigger to support this on the backend could be reclaimed. This
would be a win for any DB backend. As you move to the new partition,
issuing a truncate to clear the table would simplify the DB interfaces.

Another performance enhancement already mentioned, would be to
allow certain extra fields in the DB to be automatically populated
as a function of the log messages. For example, logging the mail queue
id for messages from mail systems would make it much easier to locate
particular mail transactions in large amounts of data.

To sum up, eliminating the escaping in rsyslog through the use of
prepared queries would reduce the CPU load on the DB backend. Batching
the inserts will also net you a big performance increase. Some DB-based
applications allow for the specification of several types of queries,
one for single inserts and then a second to support multiple inserts
(copy). Rsyslog already supports the queuing pieces to allow you to
batch inserts. Just some ideas.

Regards,
Ken


On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote:
>  wrote:
>> On Tue, 21 Apr 2009, Stephen Frost wrote:
>>> *  () wrote:
>>>> while I fully understand the 'benchmark your situation' need, this isn't
>>>> that simple.
>>>
>>> It really is.  You know your application, you know it's primary use
>>> cases, and probably have some data to play with.  You're certainly in a
>>> much better situation to at least *try* and benchmark it than we are.
>> rsyslog is a syslog server. it replaces (or for debian and fedora, has
>> replaced) your standard syslog daemon. it recieves log messages from every
>> app on your system (and possibly others), filters, maniulates them, and
>> then stores them somewhere. among the places that it can store the logs
>> are database servers (native support for MySQL, PostgreSQL, and Oracle.
>> plus libdbi for others)
>
> Well, from a performance standpoint the obvious things to do are:
> 1. Keep a connection open, do NOT reconnect for each log-statement
> 2. Batch log statements together where possible
> 3. Use prepared statements
> 4. Partition the tables by day/week/month/year (configurable I suppose)
>
> The first two are vital, the third takes you a step further. The fourth is
> a long-term admin thing.
>
> And possibly
> 5. Have two connections, one for fatal/error etc and one for info/debug
> level log statements (configurable split?). Then you can use the
> synchronous_commit setting on the less important ones. Might buy you some
> performance on a busy system.
>
> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
>
>> other apps then search and report on the data after it is stored. what
>> apps?, I don't know either. pick your favorite reporting tool and you'll
>> be a step ahead of me (I don't know a really good reporting tool)
>> as for sample data, you have syslog messages, just like I do. so you have
>> the same access to data that I have.
>> how would you want to query them? how would people far less experianced
>> that you want to query them?
>> I can speculate that some people would do two columns (time, everything
>> else), others will do three (time, server, everything else), and others
>> will go further (I know some who would like to extract IP addresses
>> embedded in a message into their own column). some people will index on
>> the time and host, others will want to do full-text searches of
>> everything.
>
> Well, assuming it looks much like traditional syslog, I would do something
> like: (timestamp, host, facility, priority, message). It's easy enough to
> stitch back together if people want that.
>
> PostgreSQL's full-text indexing is quite well suited to logfiles I'd have
> thought, since it knows about filenames, urls etc already.
>
> If you want to get fancy, add a msg_type column and one subsidiary table
> for each msg_type. So - you might have smtp_connect_from (hostname,
> ip_addr). A set of perl regexps can match and extract the fields for these
> extra tables, or you could do it with triggers inside the database. I think
> it makes sense to do it in the application. Easier for users to contribute
> new patterns/extractions. Meanwhile, the core table is untouched so you
> don't *need* to know about these extra tables.
>
> If you have subsidiary tables, you'll want to partition those too and
> perhaps stick them in their own schema (logs200901, logs200902 etc).
>
> --
>   Richard Huxton
>   Archonet Ltd
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
david@lang.hm
Дата:

Kenneth,
   could you join the discussion on the rsyslog mailing list?
rsyslog-users <>

I'm surprised to hear you say that rsyslog can already do batch inserts
and am interested in how you did that.

what sort of insert rate did you mange to get?

David Lang

On Tue, 21 Apr 2009, Kenneth Marshall wrote:

> Date: Tue, 21 Apr 2009 08:33:30 -0500
> From: Kenneth Marshall <>
> To: Richard Huxton <>
> Cc: , Stephen Frost <>,
>     Greg Smith <>, 
> Subject: Re: [PERFORM] performance for high-volume log insertion
>
> Hi,
>
> I just finished reading this thread. We are currently working on
> setting up a central log system using rsyslog and PostgreSQL. It
> works well once we patched the memory leak. We also looked at what
> could be done to improve the efficiency of the DB interface. On the
> rsyslog side, moving to prepared queries allows you to remove the
> escaping that needs to be done currently before attempting to
> insert the data into the SQL backend as well as removing the parsing
> and planning time from the insert. This is a big win for high insert
> rates, which is what we are talking about. The escaping process is
> also a big CPU user in rsyslog which then hands the escaped string
> to the backend which then has to undo everything that had been done
> and parse/plan the resulting query. This can use a surprising amount
> of additional CPU. Even if you cannot support a general prepared
> query interface, by specifying what the query should look like you
> can handle much of the low-hanging fruit query-wise.
>
> We are currently using a date based trigger to use a new partition
> each day and keep 2 months of logs currently. This can be usefully
> managed on the backend database, but if rsyslog supported changing
> the insert to the new table on a time basis, the CPU used by the
> trigger to support this on the backend could be reclaimed. This
> would be a win for any DB backend. As you move to the new partition,
> issuing a truncate to clear the table would simplify the DB interfaces.
>
> Another performance enhancement already mentioned, would be to
> allow certain extra fields in the DB to be automatically populated
> as a function of the log messages. For example, logging the mail queue
> id for messages from mail systems would make it much easier to locate
> particular mail transactions in large amounts of data.
>
> To sum up, eliminating the escaping in rsyslog through the use of
> prepared queries would reduce the CPU load on the DB backend. Batching
> the inserts will also net you a big performance increase. Some DB-based
> applications allow for the specification of several types of queries,
> one for single inserts and then a second to support multiple inserts
> (copy). Rsyslog already supports the queuing pieces to allow you to
> batch inserts. Just some ideas.
>
> Regards,
> Ken
>
>
> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote:
>>  wrote:
>>> On Tue, 21 Apr 2009, Stephen Frost wrote:
>>>> *  () wrote:
>>>>> while I fully understand the 'benchmark your situation' need, this isn't
>>>>> that simple.
>>>>
>>>> It really is.  You know your application, you know it's primary use
>>>> cases, and probably have some data to play with.  You're certainly in a
>>>> much better situation to at least *try* and benchmark it than we are.
>>> rsyslog is a syslog server. it replaces (or for debian and fedora, has
>>> replaced) your standard syslog daemon. it recieves log messages from every
>>> app on your system (and possibly others), filters, maniulates them, and
>>> then stores them somewhere. among the places that it can store the logs
>>> are database servers (native support for MySQL, PostgreSQL, and Oracle.
>>> plus libdbi for others)
>>
>> Well, from a performance standpoint the obvious things to do are:
>> 1. Keep a connection open, do NOT reconnect for each log-statement
>> 2. Batch log statements together where possible
>> 3. Use prepared statements
>> 4. Partition the tables by day/week/month/year (configurable I suppose)
>>
>> The first two are vital, the third takes you a step further. The fourth is
>> a long-term admin thing.
>>
>> And possibly
>> 5. Have two connections, one for fatal/error etc and one for info/debug
>> level log statements (configurable split?). Then you can use the
>> synchronous_commit setting on the less important ones. Might buy you some
>> performance on a busy system.
>>
>> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
>>
>>> other apps then search and report on the data after it is stored. what
>>> apps?, I don't know either. pick your favorite reporting tool and you'll
>>> be a step ahead of me (I don't know a really good reporting tool)
>>> as for sample data, you have syslog messages, just like I do. so you have
>>> the same access to data that I have.
>>> how would you want to query them? how would people far less experianced
>>> that you want to query them?
>>> I can speculate that some people would do two columns (time, everything
>>> else), others will do three (time, server, everything else), and others
>>> will go further (I know some who would like to extract IP addresses
>>> embedded in a message into their own column). some people will index on
>>> the time and host, others will want to do full-text searches of
>>> everything.
>>
>> Well, assuming it looks much like traditional syslog, I would do something
>> like: (timestamp, host, facility, priority, message). It's easy enough to
>> stitch back together if people want that.
>>
>> PostgreSQL's full-text indexing is quite well suited to logfiles I'd have
>> thought, since it knows about filenames, urls etc already.
>>
>> If you want to get fancy, add a msg_type column and one subsidiary table
>> for each msg_type. So - you might have smtp_connect_from (hostname,
>> ip_addr). A set of perl regexps can match and extract the fields for these
>> extra tables, or you could do it with triggers inside the database. I think
>> it makes sense to do it in the application. Easier for users to contribute
>> new patterns/extractions. Meanwhile, the core table is untouched so you
>> don't *need* to know about these extra tables.
>>
>> If you have subsidiary tables, you'll want to partition those too and
>> perhaps stick them in their own schema (logs200901, logs200902 etc).
>>
>> --
>>   Richard Huxton
>>   Archonet Ltd
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

От:
Kenneth Marshall
Дата:

On Tue, Apr 21, 2009 at 08:37:54AM -0700,  wrote:
> Kenneth,
>   could you join the discussion on the rsyslog mailing list?
> rsyslog-users <>
>
> I'm surprised to hear you say that rsyslog can already do batch inserts and
> am interested in how you did that.
>
> what sort of insert rate did you mange to get?
>
> David Lang
>
David,

I would be happy to join the discussion. I did not mean to say
that rsyslog currently supported batch inserts, just that the
pieces that provide "stand-by queuing" could be used to manage
batching inserts.

Cheers,
Ken

> On Tue, 21 Apr 2009, Kenneth Marshall wrote:
>
>> Date: Tue, 21 Apr 2009 08:33:30 -0500
>> From: Kenneth Marshall <>
>> To: Richard Huxton <>
>> Cc: , Stephen Frost <>,
>>     Greg Smith <>, 
>> Subject: Re: [PERFORM] performance for high-volume log insertion
>> Hi,
>>
>> I just finished reading this thread. We are currently working on
>> setting up a central log system using rsyslog and PostgreSQL. It
>> works well once we patched the memory leak. We also looked at what
>> could be done to improve the efficiency of the DB interface. On the
>> rsyslog side, moving to prepared queries allows you to remove the
>> escaping that needs to be done currently before attempting to
>> insert the data into the SQL backend as well as removing the parsing
>> and planning time from the insert. This is a big win for high insert
>> rates, which is what we are talking about. The escaping process is
>> also a big CPU user in rsyslog which then hands the escaped string
>> to the backend which then has to undo everything that had been done
>> and parse/plan the resulting query. This can use a surprising amount
>> of additional CPU. Even if you cannot support a general prepared
>> query interface, by specifying what the query should look like you
>> can handle much of the low-hanging fruit query-wise.
>>
>> We are currently using a date based trigger to use a new partition
>> each day and keep 2 months of logs currently. This can be usefully
>> managed on the backend database, but if rsyslog supported changing
>> the insert to the new table on a time basis, the CPU used by the
>> trigger to support this on the backend could be reclaimed. This
>> would be a win for any DB backend. As you move to the new partition,
>> issuing a truncate to clear the table would simplify the DB interfaces.
>>
>> Another performance enhancement already mentioned, would be to
>> allow certain extra fields in the DB to be automatically populated
>> as a function of the log messages. For example, logging the mail queue
>> id for messages from mail systems would make it much easier to locate
>> particular mail transactions in large amounts of data.
>>
>> To sum up, eliminating the escaping in rsyslog through the use of
>> prepared queries would reduce the CPU load on the DB backend. Batching
>> the inserts will also net you a big performance increase. Some DB-based
>> applications allow for the specification of several types of queries,
>> one for single inserts and then a second to support multiple inserts
>> (copy). Rsyslog already supports the queuing pieces to allow you to
>> batch inserts. Just some ideas.
>>
>> Regards,
>> Ken
>>
>>
>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote:
>>>  wrote:
>>>> On Tue, 21 Apr 2009, Stephen Frost wrote:
>>>>> *  () wrote:
>>>>>> while I fully understand the 'benchmark your situation' need, this
>>>>>> isn't
>>>>>> that simple.
>>>>>
>>>>> It really is.  You know your application, you know it's primary use
>>>>> cases, and probably have some data to play with.  You're certainly in a
>>>>> much better situation to at least *try* and benchmark it than we are.
>>>> rsyslog is a syslog server. it replaces (or for debian and fedora, has
>>>> replaced) your standard syslog daemon. it recieves log messages from
>>>> every
>>>> app on your system (and possibly others), filters, maniulates them, and
>>>> then stores them somewhere. among the places that it can store the logs
>>>> are database servers (native support for MySQL, PostgreSQL, and Oracle.
>>>> plus libdbi for others)
>>>
>>> Well, from a performance standpoint the obvious things to do are:
>>> 1. Keep a connection open, do NOT reconnect for each log-statement
>>> 2. Batch log statements together where possible
>>> 3. Use prepared statements
>>> 4. Partition the tables by day/week/month/year (configurable I suppose)
>>>
>>> The first two are vital, the third takes you a step further. The fourth
>>> is
>>> a long-term admin thing.
>>>
>>> And possibly
>>> 5. Have two connections, one for fatal/error etc and one for info/debug
>>> level log statements (configurable split?). Then you can use the
>>> synchronous_commit setting on the less important ones. Might buy you some
>>> performance on a busy system.
>>>
>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
>>>
>>>> other apps then search and report on the data after it is stored. what
>>>> apps?, I don't know either. pick your favorite reporting tool and you'll
>>>> be a step ahead of me (I don't know a really good reporting tool)
>>>> as for sample data, you have syslog messages, just like I do. so you
>>>> have
>>>> the same access to data that I have.
>>>> how would you want to query them? how would people far less experianced
>>>> that you want to query them?
>>>> I can speculate that some people would do two columns (time, everything
>>>> else), others will do three (time, server, everything else), and others
>>>> will go further (I know some who would like to extract IP addresses
>>>> embedded in a message into their own column). some people will index on
>>>> the time and host, others will want to do full-text searches of
>>>> everything.
>>>
>>> Well, assuming it looks much like traditional syslog, I would do
>>> something
>>> like: (timestamp, host, facility, priority, message). It's easy enough to
>>> stitch back together if people want that.
>>>
>>> PostgreSQL's full-text indexing is quite well suited to logfiles I'd have
>>> thought, since it knows about filenames, urls etc already.
>>>
>>> If you want to get fancy, add a msg_type column and one subsidiary table
>>> for each msg_type. So - you might have smtp_connect_from (hostname,
>>> ip_addr). A set of perl regexps can match and extract the fields for
>>> these
>>> extra tables, or you could do it with triggers inside the database. I
>>> think
>>> it makes sense to do it in the application. Easier for users to
>>> contribute
>>> new patterns/extractions. Meanwhile, the core table is untouched so you
>>> don't *need* to know about these extra tables.
>>>
>>> If you have subsidiary tables, you'll want to partition those too and
>>> perhaps stick them in their own schema (logs200901, logs200902 etc).
>>>
>>> --
>>>   Richard Huxton
>>>   Archonet Ltd
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> ()
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
Ben Chobot
Дата:

On Mon, 20 Apr 2009,  wrote:

> one huge advantage of putting the sql into the configuration is the ability
> to work around other users of the database.

+1 on this. We've always found tools much easier to work with when they
could be adapted to our schema, as opposed to changing our process for the
tool.

От:
Stephen Frost
Дата:

* Ben Chobot () wrote:
> On Mon, 20 Apr 2009,  wrote:
>> one huge advantage of putting the sql into the configuration is the
>> ability to work around other users of the database.
>
> +1 on this. We've always found tools much easier to work with when they
> could be adapted to our schema, as opposed to changing our process for
> the tool.

I think we're all in agreement that we should allow the user to define
their schema and support loading the data into it.  The question has
been if the user really needs the flexibility to define arbitrary SQL to
be used to do the inserts.

Something I'm a bit confused about, still, is if this is really even a
problem.  It sounds like rsyslog already allows arbitrary SQL in the
config file with some kind of escape syntax for the variables.  Why not
just keep that, but split it into a prepared query (where you change the
variables to $NUM vars for the prepared statement) and an array of
values (to pass to PQexecPrepared)?

If you already know how to figure out what the variables in the
arbitrary SQL statement are, this shouldn't be any more limiting than
today, except where a prepared query can't have a variable argument but
a non-prepared query can (eg, table name).  You could deal with that
with some kind of configuration variable that lets the user choose to
use prepared queries or not though, or some additional syntax that
indicates certain variables shouldn't be translated to $NUM vars (eg:
$*blah instead of $blah).

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> * Ben Chobot () wrote:
>> On Mon, 20 Apr 2009,  wrote:
>>> one huge advantage of putting the sql into the configuration is the
>>> ability to work around other users of the database.
>>
>> +1 on this. We've always found tools much easier to work with when they
>> could be adapted to our schema, as opposed to changing our process for
>> the tool.
>
> I think we're all in agreement that we should allow the user to define
> their schema and support loading the data into it.  The question has
> been if the user really needs the flexibility to define arbitrary SQL to
> be used to do the inserts.
>
> Something I'm a bit confused about, still, is if this is really even a
> problem.  It sounds like rsyslog already allows arbitrary SQL in the
> config file with some kind of escape syntax for the variables.  Why not
> just keep that, but split it into a prepared query (where you change the
> variables to $NUM vars for the prepared statement) and an array of
> values (to pass to PQexecPrepared)?
>
> If you already know how to figure out what the variables in the
> arbitrary SQL statement are, this shouldn't be any more limiting than
> today, except where a prepared query can't have a variable argument but
> a non-prepared query can (eg, table name).  You could deal with that
> with some kind of configuration variable that lets the user choose to
> use prepared queries or not though, or some additional syntax that
> indicates certain variables shouldn't be translated to $NUM vars (eg:
> $*blah instead of $blah).

I think the key thing is that rsyslog today doesn't know anything about
SQL variables, it just creates a string that the user and the database say
looks like a SQL statement.

an added headache is that the rsyslog config does not have the concept of
arrays (the closest that it has is one special-case hack to let you
specify one variable multiple times)

if the performance win of the prepared statement is significant, then it's
probably worth the complication of changing these things.

David Lang

От:
Greg Smith
Дата:

On Mon, 20 Apr 2009,  wrote:

> while I fully understand the 'benchmark your situation' need, this isn't
> that simple.  in this case we are trying to decide what API/interface to
> use in a infrastructure tool that will be distributed in common distros
> (it's now the default syslog package of debian and fedora), there are so
> many variables in hardware, software, and load that trying to benchmark
> it becomes effectivly impossible.

From your later comments, you're wandering a bit outside of what you were
asking about here.  Benchmarking the *query* side of things can be
extremely complicated.  You have to worry about memory allocation, cold
vs. warm cache, scale of database relative to RAM, etc.

You were asking specifically about *insert* performance, which isn't
nearly as complicated.  There are basically three setups:

1) Disk/controller has a proper write cache.  Writes and fsync will be
fast.  You can insert a few thousand individual transactions per second.

2) Disk/controller has a "lying" write cache.  Writes and fsync will be
fast, but it's not safe for database use.  But since (1) is expensive and
this one you can get for free jut by using a regular SATA drive with its
write cache enabled, you can use this case as a proxy for approximately
how (1) would act.  You'll still get a few thousand transactions per
second, sustained writes may slow down relative to (1) if you insert
enough that you hit a checkpoint (triggering lots of random I/O).

3) All write caches have been disabled because they were not
battery-backed.  This is the case if you have a regular SATA drive and you
disable its write cache because you care about write durability.  You'll
get a bit less than RPM/60 writes/second, so <120 inserts/second with a
typical 7200RPM drive.  Here batching multiple INSERTs together is
critical to get any sort of reasonable performance.

In (3), I'd expect that trivia like INSERT vs. COPY and COPY BINARY vs.
COPY TEXT would be overwhelmed by the overhead of the commit itself.
Therefore you probably want to test with case (2) instead, as it doesn't
require any additional hardware but has similar performance to a
production-worthy (1).  All of the other things you're worried about
really don't matter here; you can get an approximate measure of what the
performance of the various INSERT/COPY schemes are that is somewhat
platform dependant, but the results should be good enough to give you some
rule of thumb suggestions for whether optimizations are significant enough
to justify the coding effort to implement them or not.

I'm not sure whether you're familiar with all the fsync trivia here.  In
normal syslog use, there's an fsync call after every write.  You can
disable that by placing a "-" before the file name in /etc/syslog.conf The
thing that is going to make database-based writes very different is that
syslog's fsync'd writes are unlikely to leave you in a bad state if the
drive lies about them, while database writes can.  So someone using syslog
on a standard SATA drive isn't getting the write guarantee they think they
are, but the downside on a crash is minimal.  If you've got a high-volume
syslog environment (>100 lines/second), you can't support those as
individual database writes unless you've got a battery-backed write
controller.  A regular disk just can't process genuine fsync calls any
faster than that.  A serious syslog deployment that turns fsync on and
expects it to really do its thing is already exposed to this issue though.
I think it may be a the case that a lot of people think they have durable
writes in their configuration but really don't.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Stephen Frost
Дата:

*  () wrote:
> I think the key thing is that rsyslog today doesn't know anything about
> SQL variables, it just creates a string that the user and the database
> say looks like a SQL statement.

err, what SQL variables?  You mean the $NUM stuff?  They're just
placeholders..  You don't really need to *do* anything with them..  Or
are you worried that users would provide something that would break as a
prepared query?  If so, you just need to figure out how to handle that
cleanly..

> an added headache is that the rsyslog config does not have the concept of
> arrays (the closest that it has is one special-case hack to let you
> specify one variable multiple times)

Argh.  The array I'm talking about is a C array, and has nothing to do
with the actual config syntax..  I swear, I think you're making this
more difficult by half.

Alright, looking at the documentation on rsyslog.com, I see something
like:

$template MySQLInsert,"insert iut, message, receivedat values
('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
into systemevents\r\n", SQL

Ignoring the fact that this is horrible, horrible non-SQL, I see that
you use %blah% to define variables inside your string.  That's fine.
There's no reason why you can't use this exact syntax to build a
prepared query.  No user-impact changes are necessary.  Here's what you
do:

    build your prepared query by doing:
    copy user string
    newstring = replace all %blah% strings with $1, $2, $3, etc.
    myvars = dynamically created C array with the %blah%'s in it
    call PQprepare(newstring)

    when a record comes in:
    allocate a myvalues array of pointers
    loop through myvars
      for each myvar
          set the corresponding pointer in myvalues to the string which
              it corresponds to from the record
    call PQexecPrepared(myvalues)

That's pretty much it.  I assume you already deal with escaping %'s
somehow during the config load so that the prepared statement will be
what the user expects.  As I mentioned before, the only obvious issue I
see with doing this implicitly is that the user might want to put
variables in places that you can't have variables in prepared queries.
You could deal with that by having the user indicate per template, using
another template option, if the query can be prepared or not.  Another
options is adding to your syntax something like '%*blah%' which would
tell the system to pre-populate that variable before issuing PQprepare
on the resultant string.  Of course, you might just use PQexecParams
there, unless you want to be gung-ho and actually keep a hash around of
prepared queries on the assumption that the variable the user gave you
doesn't change very often (eg, '%*month%') and it's cheap to keep a
small list of them around to use when they do match up.

    Thanks,

        Stephen

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Greg Smith wrote:

> On Mon, 20 Apr 2009,  wrote:
>
>> while I fully understand the 'benchmark your situation' need, this isn't
>> that simple.  in this case we are trying to decide what API/interface to
>> use in a infrastructure tool that will be distributed in common distros
>> (it's now the default syslog package of debian and fedora), there are so
>> many variables in hardware, software, and load that trying to benchmark it
>> becomes effectivly impossible.
>
> From your later comments, you're wandering a bit outside of what you were
> asking about here.  Benchmarking the *query* side of things can be extremely
> complicated.  You have to worry about memory allocation, cold vs. warm cache,
> scale of database relative to RAM, etc.
>
> You were asking specifically about *insert* performance, which isn't nearly
> as complicated.  There are basically three setups:
>
> 1) Disk/controller has a proper write cache.  Writes and fsync will be fast.
> You can insert a few thousand individual transactions per second.
>
> 2) Disk/controller has a "lying" write cache.  Writes and fsync will be fast,
> but it's not safe for database use.  But since (1) is expensive and this one
> you can get for free jut by using a regular SATA drive with its write cache
> enabled, you can use this case as a proxy for approximately how (1) would
> act.  You'll still get a few thousand transactions per second, sustained
> writes may slow down relative to (1) if you insert enough that you hit a
> checkpoint (triggering lots of random I/O).
>
> 3) All write caches have been disabled because they were not battery-backed.
> This is the case if you have a regular SATA drive and you disable its write
> cache because you care about write durability.  You'll get a bit less than
> RPM/60 writes/second, so <120 inserts/second with a typical 7200RPM drive.
> Here batching multiple INSERTs together is critical to get any sort of
> reasonable performance.

in case #1 would you expect to get significant gains from batching?
doesn't it suffer from problems similar to #2 when checkpoints hit?

> In (3), I'd expect that trivia like INSERT vs. COPY and COPY BINARY vs. COPY
> TEXT would be overwhelmed by the overhead of the commit itself. Therefore you
> probably want to test with case (2) instead, as it doesn't require any
> additional hardware but has similar performance to a production-worthy (1).
> All of the other things you're worried about really don't matter here; you
> can get an approximate measure of what the performance of the various
> INSERT/COPY schemes are that is somewhat platform dependant, but the results
> should be good enough to give you some rule of thumb suggestions for whether
> optimizations are significant enough to justify the coding effort to
> implement them or not.

I'll see about setting up a test in the next day or so. should I be able
to script this through psql? or do I need to write a C program to test
this?

> I'm not sure whether you're familiar with all the fsync trivia here.  In
> normal syslog use, there's an fsync call after every write.  You can disable
> that by placing a "-" before the file name in /etc/syslog.conf The thing that
> is going to make database-based writes very different is that syslog's
> fsync'd writes are unlikely to leave you in a bad state if the drive lies
> about them, while database writes can.  So someone using syslog on a standard
> SATA drive isn't getting the write guarantee they think they are, but the
> downside on a crash is minimal.  If you've got a high-volume syslog
> environment (>100 lines/second), you can't support those as individual
> database writes unless you've got a battery-backed write controller.  A
> regular disk just can't process genuine fsync calls any faster than that.  A
> serious syslog deployment that turns fsync on and expects it to really do its
> thing is already exposed to this issue though. I think it may be a the case
> that a lot of people think they have durable writes in their configuration
> but really don't.

rsyslog is a little different, instead of just input -> disk it does input
-> queue -> output (where output can be many things, including disk or
database)

it's default is to use memory-based queues (and no fsync), but has config
options to do disk based queues with a fsync after each update

David Lang

От:
Kenneth Marshall
Дата:

On Tue, Apr 21, 2009 at 11:09:18AM -0700,  wrote:
> On Tue, 21 Apr 2009, Greg Smith wrote:
>
>> On Mon, 20 Apr 2009,  wrote:
>>
>>> while I fully understand the 'benchmark your situation' need, this isn't
>>> that simple.  in this case we are trying to decide what API/interface to
>>> use in a infrastructure tool that will be distributed in common distros
>>> (it's now the default syslog package of debian and fedora), there are so
>>> many variables in hardware, software, and load that trying to benchmark
>>> it becomes effectivly impossible.
>>
>> From your later comments, you're wandering a bit outside of what you were
>> asking about here.  Benchmarking the *query* side of things can be
>> extremely complicated.  You have to worry about memory allocation, cold
>> vs. warm cache, scale of database relative to RAM, etc.
>>
>> You were asking specifically about *insert* performance, which isn't
>> nearly as complicated.  There are basically three setups:
>>
>> 1) Disk/controller has a proper write cache.  Writes and fsync will be
>> fast. You can insert a few thousand individual transactions per second.
>>
>> 2) Disk/controller has a "lying" write cache.  Writes and fsync will be
>> fast, but it's not safe for database use.  But since (1) is expensive and
>> this one you can get for free jut by using a regular SATA drive with its
>> write cache enabled, you can use this case as a proxy for approximately
>> how (1) would act.  You'll still get a few thousand transactions per
>> second, sustained writes may slow down relative to (1) if you insert
>> enough that you hit a checkpoint (triggering lots of random I/O).
>>
>> 3) All write caches have been disabled because they were not
>> battery-backed. This is the case if you have a regular SATA drive and you
>> disable its write cache because you care about write durability.  You'll
>> get a bit less than RPM/60 writes/second, so <120 inserts/second with a
>> typical 7200RPM drive. Here batching multiple INSERTs together is critical
>> to get any sort of reasonable performance.
>
> in case #1 would you expect to get significant gains from batching? doesn't
> it suffer from problems similar to #2 when checkpoints hit?
>
Even with a disk controller with a proper write cache, the latency for
single-insert-at-a-time will keep the number of updates to the low
thousands per second (on the controllers I have used). If you can batch
them, it would not be unreasonable to increase performance by an order
of magnitude or more. At the high end, other issues like CPU usage can
restrict performance.

Ken
>> In (3), I'd expect that trivia like INSERT vs. COPY and COPY BINARY vs.
>> COPY TEXT would be overwhelmed by the overhead of the commit itself.
>> Therefore you probably want to test with case (2) instead, as it doesn't
>> require any additional hardware but has similar performance to a
>> production-worthy (1). All of the other things you're worried about really
>> don't matter here; you can get an approximate measure of what the
>> performance of the various INSERT/COPY schemes are that is somewhat
>> platform dependant, but the results should be good enough to give you some
>> rule of thumb suggestions for whether optimizations are significant enough
>> to justify the coding effort to implement them or not.
>
> I'll see about setting up a test in the next day or so. should I be able to
> script this through psql? or do I need to write a C program to test this?
>
>> I'm not sure whether you're familiar with all the fsync trivia here.  In
>> normal syslog use, there's an fsync call after every write.  You can
>> disable that by placing a "-" before the file name in /etc/syslog.conf The
>> thing that is going to make database-based writes very different is that
>> syslog's fsync'd writes are unlikely to leave you in a bad state if the
>> drive lies about them, while database writes can.  So someone using syslog
>> on a standard SATA drive isn't getting the write guarantee they think they
>> are, but the downside on a crash is minimal.  If you've got a high-volume
>> syslog environment (>100 lines/second), you can't support those as
>> individual database writes unless you've got a battery-backed write
>> controller.  A regular disk just can't process genuine fsync calls any
>> faster than that.  A serious syslog deployment that turns fsync on and
>> expects it to really do its thing is already exposed to this issue though.
>> I think it may be a the case that a lot of people think they have durable
>> writes in their configuration but really don't.
>
> rsyslog is a little different, instead of just input -> disk it does input
> -> queue -> output (where output can be many things, including disk or
> database)
>
> it's default is to use memory-based queues (and no fsync), but has config
> options to do disk based queues with a fsync after each update
>
> David Lang
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> *  () wrote:
>> I think the key thing is that rsyslog today doesn't know anything about
>> SQL variables, it just creates a string that the user and the database
>> say looks like a SQL statement.
>
> err, what SQL variables?  You mean the $NUM stuff?  They're just
> placeholders..  You don't really need to *do* anything with them..  Or
> are you worried that users would provide something that would break as a
> prepared query?  If so, you just need to figure out how to handle that
> cleanly..
>
>> an added headache is that the rsyslog config does not have the concept of
>> arrays (the closest that it has is one special-case hack to let you
>> specify one variable multiple times)
>
> Argh.  The array I'm talking about is a C array, and has nothing to do
> with the actual config syntax..  I swear, I think you're making this
> more difficult by half.

not intentinally, but you may be right.

> Alright, looking at the documentation on rsyslog.com, I see something
> like:
>
> $template MySQLInsert,"insert iut, message, receivedat values
> ('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
> into systemevents\r\n", SQL
>
> Ignoring the fact that this is horrible, horrible non-SQL,

that example is for MySQL, nuff said ;-) or are you referring to the
modifiers that rsyslog has to manipulate the strings before inserting
them? (as opposed to using sql to manipulate the strings)

> I see that
> you use %blah% to define variables inside your string.  That's fine.
> There's no reason why you can't use this exact syntax to build a
> prepared query.  No user-impact changes are necessary.  Here's what you
> do:

<snip psudocode to replace %blah% with $num>

for some reason I was stuck on the idea of the config specifying the
statement and variables seperatly, so I wasn't thinking this way, however
there are headaches

doing this will require changes to the structure of rsyslog, today the
string manipulation is done before calling the output (database) module,
so all the database module currently gets is a string. in a (IMHO
misguided) attempt at security in a multi-threaded program, the output
modules are not given access to the full data, only to the distiled
result.

also, this approach won't work if the user wants to combine fixed text
with the variable into a column. an example of doing that would be to have
a filter to match specific lines, and then use a slightly different
template for those lines. I guess that could be done in SQL instead of in
the rsyslog string manipulation (i.e. instead of 'blah-%host%' do
'blah-'||'%host')

> As I mentioned before, the only obvious issue I
> see with doing this implicitly is that the user might want to put
> variables in places that you can't have variables in prepared queries.

this problem space would be anywhere except the column contents, right?

> You could deal with that by having the user indicate per template, using
> another template option, if the query can be prepared or not.  Another
> options is adding to your syntax something like '%*blah%' which would
> tell the system to pre-populate that variable before issuing PQprepare
> on the resultant string.  Of course, you might just use PQexecParams
> there, unless you want to be gung-ho and actually keep a hash around of
> prepared queries on the assumption that the variable the user gave you
> doesn't change very often (eg, '%*month%') and it's cheap to keep a
> small list of them around to use when they do match up.

rsyslog supports something similar for writing to disk where you can use
variables as part of the filename/path (referred to as 'dynafiles' in the
documentation). that's a little easier to deal with as the filename is
specified seperatly from the format of the data to write. If we end up
doing prepared statements I suspect they initially won't support variables
outside of the columns.

David Lang

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009,  wrote:

>> I see that
>> you use %blah% to define variables inside your string.  That's fine.
>> There's no reason why you can't use this exact syntax to build a
>> prepared query.  No user-impact changes are necessary.  Here's what you
>> do:
>
> <snip psudocode to replace %blah% with $num>
>
> for some reason I was stuck on the idea of the config specifying the
> statement and variables seperatly, so I wasn't thinking this way, however
> there are headaches
>
> doing this will require changes to the structure of rsyslog, today the string
> manipulation is done before calling the output (database) module, so all the
> database module currently gets is a string. in a (IMHO misguided) attempt at
> security in a multi-threaded program, the output modules are not given access
> to the full data, only to the distiled result.
>
> also, this approach won't work if the user wants to combine fixed text with
> the variable into a column. an example of doing that would be to have a
> filter to match specific lines, and then use a slightly different template
> for those lines. I guess that could be done in SQL instead of in the rsyslog
> string manipulation (i.e. instead of 'blah-%host%' do 'blah-'||'%host')

by the way, now that I understand how you were viewing this, I see why you
were saying that there would need to be a SQL parser. I was missing that
headache, by going the direction of having the user specify the individual
components (which has it's own headache)

David Lang

От:
Stephen Frost
Дата:

*  () wrote:
>> Ignoring the fact that this is horrible, horrible non-SQL,
>
> that example is for MySQL, nuff said ;-)

indeed.

> for some reason I was stuck on the idea of the config specifying the
> statement and variables seperatly, so I wasn't thinking this way, however
> there are headaches

Nothing worth doing is ever completely without complications. :)

> doing this will require changes to the structure of rsyslog, today the
> string manipulation is done before calling the output (database) module,
> so all the database module currently gets is a string. in a (IMHO
> misguided) attempt at security in a multi-threaded program, the output
> modules are not given access to the full data, only to the distiled
> result.

Ah, yes, that's definitely a problem and I agree- a very misguided
approach to doing things.  Certainly, to use prepared queries, you will
have to pass the data to whatever is talking to the database in some
kind of structured way.  There's not much advantage if you're getting it
as a string and having to parse it out yourself before using a prepared
query with the database.

In a multi-threaded program, I think it would at least be reasonably
easy/cheap to provide the output modules with the full data?  Of course,
you would need to teach the string manipulation logic to not do its
escaping and other related work for prepared queries which are just
going to use the full data anyway.

> also, this approach won't work if the user wants to combine fixed text
> with the variable into a column. an example of doing that would be to
> have a filter to match specific lines, and then use a slightly different
> template for those lines. I guess that could be done in SQL instead of in
> the rsyslog string manipulation (i.e. instead of 'blah-%host%' do
> 'blah-'||'%host')

It would be more like: 'blah-' || %host%

Just to be clear (if you put the %host% in quotes, and then convert that
to '$1', it won't be considered a variable, at least in PG).  That might
be an issue going forward, but on the flip side, I can see some reasons
for supporting both prepared and unprepared queries, so if you implement
that through an additional template option, you can document that the
user needs to ensure the prepared query is structured correctly with the
correct quoting.  This gives you the flexibility of the unprepared query
for users who don't care about performance, and the benefits of prepared
queries, where they can be used, for users who do need that performance.

Or you could just force your users to move everything to prepared
queries but it's probably too late for that. :)  Maybe if things had
started out that way..

>> As I mentioned before, the only obvious issue I
>> see with doing this implicitly is that the user might want to put
>> variables in places that you can't have variables in prepared queries.
>
> this problem space would be anywhere except the column contents, right?

Well, it depends on the query..  You can have variables in the column
contents, sure, but you can also have them in where clauses if you're
doing something like:

insert into blah select $1,$2,$3,b from mytable where $2 = c;

I believe, in PG at least, you can use them pretty much anywhere you can
use a constant.

> rsyslog supports something similar for writing to disk where you can use
> variables as part of the filename/path (referred to as 'dynafiles' in the
> documentation). that's a little easier to deal with as the filename is
> specified seperatly from the format of the data to write. If we end up
> doing prepared statements I suspect they initially won't support
> variables outside of the columns.

That sounds reasonable, to me at least.

    Thanks,

        Stephen

От:
Stephen Frost
Дата:

*  () wrote:
> by the way, now that I understand how you were viewing this, I see why
> you were saying that there would need to be a SQL parser. I was missing
> that headache, by going the direction of having the user specify the
> individual components (which has it's own headache)

Right, but really, you're already parsing the SQL to the extent that you
need to, and whatever limitations and headaches that causes you've
already had to deal with through proper escaping and whatnot of your
variables..  So I'm not sure that it'll be all that bad in the end.

If you add this as a new feature that users essentially have to opt-in
to, then I think you can offload alot of the work on to the users for
doing things like fixing quoting (so the $NUM vars aren't quoted).

    Thanks,

        Stephen

От:
James Mansion
Дата:

 wrote:
>>> 2. insert into table values (),(),(),()
>>
>> Using this structure would be more database agnostic, but won't perform
>> as well as the COPY options I don't believe.  It might be interesting to
>> do a large "insert into table values (),(),()" as a prepared statement,
>> but then you'd have to have different sizes for each different number of
>> items you want inserted.
>
> on the other hand, when you have a full queue (lots of stuff to
> insert) is when you need the performance the most. if it's enough of a
> win on the database side, it could be worth more effort on the
> applicaiton side.
Are you sure preparing a simple insert is really worthwhile?

I'd check if I were you.  It shouldn't take long to plan.

Note that this structure (above) is handy but not universal.

You might want to try:

insert into table
select (...)
union
select (...)
union
select (...)
...

as well, since its more univeral.  Works on Sybase and SQLServer for
example (and v.quickly too - much more so than a TSQL batch with lots of
inserts or execs of stored procs)

James



От:
Stephen Frost
Дата:

* James Mansion () wrote:
>  wrote:
>> on the other hand, when you have a full queue (lots of stuff to
>> insert) is when you need the performance the most. if it's enough of a
>> win on the database side, it could be worth more effort on the
>> applicaiton side.
> Are you sure preparing a simple insert is really worthwhile?
>
> I'd check if I were you.  It shouldn't take long to plan.

Using prepared queries, at least if you use PQexecPrepared or
PQexecParams, also reduces the work required on the client to build the
whole string, and the parsing overhead on the database side to pull it
apart again.  That's where the performance is going to be improved by
going that route, not so much in eliminating the planning.

    Thanks,

        Stephen

От:
Greg Smith
Дата:

On Tue, 21 Apr 2009,  wrote:

>> 1) Disk/controller has a proper write cache.  Writes and fsync will be
>> fast. You can insert a few thousand individual transactions per second.
>>
> in case #1 would you expect to get significant gains from batching? doesn't
> it suffer from problems similar to #2 when checkpoints hit?

Typically controllers with a write cache are doing elevator sorting across
a much larger chunk of working memory (typically >=256MB instead of <32MB
on the disk itself) which means a mix of random writes will average better
performance--on top of being able to aborb a larger chunk of them before
blocking on writes.  You get some useful sorting in the OS itself, but
every layer of useful additional cache helps significantly here.

Batching is always a win because even a write-cached commit is still
pretty expensive, from the server on down the chain.

> I'll see about setting up a test in the next day or so. should I be able to
> script this through psql? or do I need to write a C program to test this?

You can easily compare things with psql, like in the COPY BINARY vs. TEXT
example I gave earlier, that's why I was suggesting you run your own tests
here just to get a feel for things on your data set.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
david@lang.hm
Дата:

On Tue, 21 Apr 2009, Stephen Frost wrote:

> * James Mansion () wrote:
>>  wrote:
>>> on the other hand, when you have a full queue (lots of stuff to
>>> insert) is when you need the performance the most. if it's enough of a
>>> win on the database side, it could be worth more effort on the
>>> applicaiton side.
>> Are you sure preparing a simple insert is really worthwhile?
>>
>> I'd check if I were you.  It shouldn't take long to plan.
>
> Using prepared queries, at least if you use PQexecPrepared or
> PQexecParams, also reduces the work required on the client to build the
> whole string, and the parsing overhead on the database side to pull it
> apart again.  That's where the performance is going to be improved by
> going that route, not so much in eliminating the planning.

in a recent thread about prepared statements, where it was identified that
since the planning took place at the time of the prepare you sometimes
have worse plans than for non-prepared statements, a proposal was made to
have a 'pre-parsed, but not pre-planned' version of a prepared statement.
This was dismissed as a waste of time (IIRC by Tom L) as the parsing time
was negligable.

was that just because it was a more complex query to plan?

David Lang

От:
Robert Haas
Дата:

On Tue, Apr 21, 2009 at 8:12 PM,  <> wrote:
>> Using prepared queries, at least if you use PQexecPrepared or
>> PQexecParams, also reduces the work required on the client to build the
>> whole string, and the parsing overhead on the database side to pull it
>> apart again.  That's where the performance is going to be improved by
>> going that route, not so much in eliminating the planning.
>
> in a recent thread about prepared statements, where it was identified that
> since the planning took place at the time of the prepare you sometimes have
> worse plans than for non-prepared statements, a proposal was made to have a
> 'pre-parsed, but not pre-planned' version of a prepared statement. This was
> dismissed as a waste of time (IIRC by Tom L) as the parsing time was
> negligable.
>
> was that just because it was a more complex query to plan?

Joins are expensive to plan; a simple insert is not.  I also disagree
that pre-parsed but not pre-planned is a waste of time, whoever said
it.  Sometimes it's what you want, especially in PL/pgsql.

...Robert

От:
James Mansion
Дата:

Stephen Frost wrote:
> apart again.  That's where the performance is going to be improved by
> going that route, not so much in eliminating the planning.
>
Fine.  But like I said, I'd suggest measuring the fractional improvement
for this
when sending multi-row inserts before writing something complex.  I
think the
big will will be doing multi-row inserts at all.  If you are going to
prepare then
you'll need a collection of different prepared statements for different
batch sizes
(say 1,2,3,4,5,10,20,50) and things will get complicated.  A multi-row
insert
with unions and dynamic SQL is actually rather universal.

Personally I'd implement that first (and it should be easy to do across
multiple
dbms types) and then return to it to have a more complex client side with
prepared statements etc if (and only if) necessary AND the performance
improvement were measurably worthwhile, given the indexing and storage
overheads.

There is no point optimising away the CPU of the simple parse if you are
just going to get hit with a lot of latency from round trips, and forming a
generic multi-insert SQL string is much, much easier to get working as a
first
step. Server CPU isn't a bottleneck all that often - and with something as
simple as this you'll hit IO performance bottlenecks rather easily.

James



От:
Stephen Frost
Дата:

David,

*  () wrote:
> in a recent thread about prepared statements, where it was identified
> that since the planning took place at the time of the prepare you
> sometimes have worse plans than for non-prepared statements, a proposal
> was made to have a 'pre-parsed, but not pre-planned' version of a
> prepared statement. This was dismissed as a waste of time (IIRC by Tom L)
> as the parsing time was negligable.
>
> was that just because it was a more complex query to plan?

Yes, as I beleive was mentioned already, planning time for inserts is
really small.  Parsing time for inserts when there's little parsing that
has to happen also isn't all *that* expensive and the same goes for
conversions from textual representations of data to binary.

We're starting to re-hash things, in my view.  The low-hanging fruit is
doing multiple things in a single transaction, either by using COPY,
multi-value INSERTs, or just multiple INSERTs in a single transaction.
That's absolutely step one.

Adding in other things, where they make sense (prepared statements,
binary format for things you have as binary, etc) is a good idea if it
can be done along the way.

    Stephen

От:
Stephen Frost
Дата:

* James Mansion () wrote:
> Fine.  But like I said, I'd suggest measuring the fractional improvement
> for this
> when sending multi-row inserts before writing something complex.  I
> think the
> big will will be doing multi-row inserts at all.

You're re-hashing things I've already said.  The big win is batching the
inserts, however that's done, into fewer transactions.  Sure, multi-row
inserts could be used to do that, but so could dropping begin/commits in
right now which probably takes even less effort.

> If you are going to
> prepare then
> you'll need a collection of different prepared statements for different
> batch sizes
> (say 1,2,3,4,5,10,20,50) and things will get complicated.  A multi-row
> insert
> with unions and dynamic SQL is actually rather universal.

No, as was pointed out previously already, you really just need 2.  A
single-insert, and a batch insert of some size.  It'd be interesting to
see if there's really much of a performance difference between a
50-insert prepared statement, and 50 1-insert prepared statements.  If
they're both done in larger transactions, I don't know that there's
really alot of performance difference.

> Personally I'd implement that first (and it should be easy to do across
> multiple
> dbms types) and then return to it to have a more complex client side with
> prepared statements etc if (and only if) necessary AND the performance
> improvement were measurably worthwhile, given the indexing and storage
> overheads.

storage overhead?  indexing overhead?  We're talking about prepared
statements here, what additional storage requirement do you think those
would impose?  What additional indexing overhead?  I don't believe we
actually do anything differently between prepared statements and
multi-row inserts that would change either of those.

> There is no point optimising away the CPU of the simple parse if you are
> just going to get hit with a lot of latency from round trips, and forming a
> generic multi-insert SQL string is much, much easier to get working as a
> first
> step. Server CPU isn't a bottleneck all that often - and with something as
> simple as this you'll hit IO performance bottlenecks rather easily.

Ah, latency is a reasonable thing to bring up.  Of course, if you want
to talk about latency then you get to consider that multi-insert SQL
will inherently have larger packet sizes which could cause them to be
delayed in some QoS arrangements.

As I said, most of this is a re-hash of things already said.  The
low-hanging fruit here is doing multiple inserts inside of a
transaction, rather than 1 insert per transaction.  Regardless of how
that's done, it's going to give the best bang-for-buck.  It will
complicate the client code some, regardless of how it's implemented, so
that failures are handled gracefully (if that's something you care about
anyway), but as there exists some queueing mechanisms in rsyslog
already, hopefully it won't be too bad.

    Thanks,

        Stephen

От:
Simon Riggs
Дата:

On Mon, 2009-04-20 at 14:53 -0700,  wrote:

> the big win is going to be in changing the core of rsyslog so that it can
> process multiple messages at a time (bundling them into a single
> transaction)

That isn't necessarily true as a single "big win".

The reason there is an overhead per transaction is because of commit
delays, which can be removed by executing

  SET synchronous_commit = off;

after connecting to PostgreSQL 8.3+

You won't need to do much else. This can also be enabled for a
PostgreSQL user without even changing the rsyslog source code, so it
should be easy enough to test.

And this type of application is *exactly* what it was designed for.

Some other speedups should also be possible, but this is easiest.

I would guess that batching inserts will be a bigger win than simply
using prepared statements because it will reduce network roundtrips to a
centralised log server. Preparing statements might show up well on tests
because people will do tests against a local database, most likely.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <> wrote:
> Yes, as I beleive was mentioned already, planning time for inserts is
> really small.  Parsing time for inserts when there's little parsing that
> has to happen also isn't all *that* expensive and the same goes for
> conversions from textual representations of data to binary.
>
> We're starting to re-hash things, in my view.  The low-hanging fruit is
> doing multiple things in a single transaction, either by using COPY,
> multi-value INSERTs, or just multiple INSERTs in a single transaction.
> That's absolutely step one.

This is all well-known, covered information, but perhaps some numbers
will help drive this home.  40000 inserts into a single-column,
unindexed table; with predictable results:

separate inserts, no transaction: 21.21s
separate inserts, same transaction: 1.89s
40 inserts, 100 rows/insert: 0.18s
one 40000-value insert: 0.16s
40 prepared inserts, 100 rows/insert: 0.15s
COPY (text): 0.10s
COPY (binary): 0.10s

Of course, real workloads will change the weights, but this is more or
less the magnitude of difference I always see--batch your inserts into
single statements, and if that's not enough, skip to COPY.

--
Glenn Maynard

От:
david@lang.hm
Дата:

On Wed, 22 Apr 2009, Glenn Maynard wrote:

> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <> wrote:
>> Yes, as I beleive was mentioned already, planning time for inserts is
>> really small.  Parsing time for inserts when there's little parsing that
>> has to happen also isn't all *that* expensive and the same goes for
>> conversions from textual representations of data to binary.
>>
>> We're starting to re-hash things, in my view.  The low-hanging fruit is
>> doing multiple things in a single transaction, either by using COPY,
>> multi-value INSERTs, or just multiple INSERTs in a single transaction.
>> That's absolutely step one.
>
> This is all well-known, covered information, but perhaps some numbers
> will help drive this home.  40000 inserts into a single-column,
> unindexed table; with predictable results:
>
> separate inserts, no transaction: 21.21s
> separate inserts, same transaction: 1.89s

are these done as seperate round trips?

i.e.
begin <send>
insert <send>
insert <send>
..
end <send>

or as one round trip?

i.e.
begin;insert;insert..;end

> 40 inserts, 100 rows/insert: 0.18s
> one 40000-value insert: 0.16s
> 40 prepared inserts, 100 rows/insert: 0.15s

are one of these missing a 0?

> COPY (text): 0.10s
> COPY (binary): 0.10s
>
> Of course, real workloads will change the weights, but this is more or
> less the magnitude of difference I always see--batch your inserts into
> single statements, and if that's not enough, skip to COPY.

thanks for this information, this is exactly what I was looking for.

can this get stored somewhere for reference?

David Lang

От:
Stephen Frost
Дата:

Glenn,

* Glenn Maynard () wrote:
> This is all well-known, covered information, but perhaps some numbers
> will help drive this home.  40000 inserts into a single-column,
> unindexed table; with predictable results:

Thanks for doing the work.  I had been intending to but hadn't gotten to
it yet.

> separate inserts, no transaction: 21.21s
> separate inserts, same transaction: 1.89s
> 40 inserts, 100 rows/insert: 0.18s
> one 40000-value insert: 0.16s
> 40 prepared inserts, 100 rows/insert: 0.15s
> COPY (text): 0.10s
> COPY (binary): 0.10s

What about 40000 individual prepared inserts?  Just curious about it.

Also, kind of suprised about COPY text vs. binary.  What was the data
type in the table..?  If text, that makes sense, if it was an integer or
something else, I'm kind of suprised.

    Thanks,

        Stephen

От:
Tom Lane
Дата:

Stephen Frost <> writes:
> * Glenn Maynard () wrote:
>> separate inserts, no transaction: 21.21s
>> separate inserts, same transaction: 1.89s
>> 40 inserts, 100 rows/insert: 0.18s
>> one 40000-value insert: 0.16s
>> 40 prepared inserts, 100 rows/insert: 0.15s
>> COPY (text): 0.10s
>> COPY (binary): 0.10s

> What about 40000 individual prepared inserts?  Just curious about it.

Also, just to be clear: were the "40 insert" cases 40 separate
transactions or one transaction?  I think the latter was meant but
it's not 100% clear.

            regards, tom lane

От:
James Mansion
Дата:

Stephen Frost wrote:
> You're re-hashing things I've already said.  The big win is batching the
> inserts, however that's done, into fewer transactions.  Sure, multi-row
> inserts could be used to do that, but so could dropping begin/commits in
> right now which probably takes even less effort.
>
Well, I think you are seriously underestimating the cost of the
round-trip compared
to all the other effects (possibly bar the commits).  When I tested the
union insert
technique on SQLServer and Sybase I got measurable improvements going from
100 row statements to 200 row statements, though I suspect in that case the
per-statement overheads are quite high.  I expected improvements from 10
to 20
row batches, but it carried on getting better for a long time after
that.  The
Sybase parser runs out of workspace first.


> No, as was pointed out previously already, you really just need 2.  A
>
And I'm disagreeing with that.  Single row is a given, but I think
you'll find it pays to have one
round trip if at all possible and invoking multiple prepared statements
can work against this.

> see if there's really much of a performance difference between a
> 50-insert prepared statement, and 50 1-insert prepared statements.  If
> they're both done in larger transactions, I don't know that there's
> really alot of performance difference.
>
I think you'll be surprised, but the only way is to test it.  And also
the simple 50 row single
insert as text.  See if you can measure the difference between that and
the prepared
statement.
> storage overhead?  indexing overhead?  We're talking about prepared
> statements here, what additional storage requirement do you think those
> would impose?  What additional indexing overhead?  I don't believe we
> actually do anything differently between prepared statements and
> multi-row inserts that would change either of those.
>
That's my point.  You will brickwall on the actual database operations
for execution
early enough that the efficiency difference between parse-and-execute
and prepared
statements will be hard to measure - at least if you have multi-row
statements.

But this really needs testing and timing.

> Ah, latency is a reasonable thing to bring up.  Of course, if you want
> to talk about latency then you get to consider that multi-insert SQL
> will inherently have larger packet sizes which could cause them to be
> delayed in some QoS arrangements.
>
No, I mean latency from round trips from the client to the server
process.  I don't know why
you think I'd mean that.
> As I said, most of this is a re-hash of things already said.  The
> low-hanging fruit here is doing multiple inserts inside of a
> transaction, rather than 1 insert per transaction.  Regardless of how
> that's done, it's going to give the best bang-for-buck.  It will
> complicate the client code some, regardless of how it's implemented, so
> that failures are handled gracefully (if that's something you care about
> anyway), but as there exists some queueing mechanisms in rsyslog
> already, hopefully it won't be too bad.
>
I think you have largely missed the point. There are two things here:
 1) how many rows per commit
 2) how many rows per logical RPC (ie round trip) between the client
   and server processes

We are agreed that the first is a Very Big Deal, but you seem resistant to
the idea that the second of these is a big deal once you've dealt with
the former.

My experience has been that its much more important than any benefits of
preparing statements etc, particularly if the use of a prepared
statement can
make it harder to do multi-row RPCs because the protocol doesn't
allow pipelining (at least without things getting very hairy).

Clearly 'copy' is your friend for this too, at least potentially (even
if it means
streaming to a staging table).


James


От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 4:07 PM,  <> wrote:
> are these done as seperate round trips?
>
> i.e.
> begin <send>
> insert <send>
> insert <send>
> ..
> end <send>
>
> or as one round trip?

All tests were done by constructing a file and using "time psql -f ...".

>> 40 inserts, 100 rows/insert: 0.18s
>> one 40000-value insert: 0.16s
>> 40 prepared inserts, 100 rows/insert: 0.15s
>
> are one of these missing a 0?

Sorry, 400 * 100.  All cases inserted 40000 rows, and I deleted all
rows between tests (but did not recreate the table).

--
Glenn Maynard

От:
"Joshua D. Drake"
Дата:

On Wed, 2009-04-22 at 21:53 +0100, James Mansion wrote:
> Stephen Frost wrote:
> > You're re-hashing things I've already said.  The big win is batching the
> > inserts, however that's done, into fewer transactions.  Sure, multi-row
> > inserts could be used to do that, but so could dropping begin/commits in
> > right now which probably takes even less effort.
> >
> Well, I think you are seriously underestimating the cost of the
> round-trip compared

The breakdown is this:

1. Eliminate single inserts
2. Eliminate round trips

Yes round trips are hugely expensive.

>
> > No, as was pointed out previously already, you really just need 2.  A
> >
> And I'm disagreeing with that.  Single row is a given, but I think
> you'll find it pays to have one

My experience shows that you are correct. Even if you do a single BEGIN;
with 1000 inserts you are still getting a round trip for every insert
until you commit. Based on 20ms round trip time, you are talking
20seconds additional overhead.

Joshua D. Drake


--
PostgreSQL - XMPP: 
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 4:37 PM, Stephen Frost <> wrote:
> Thanks for doing the work.  I had been intending to but hadn't gotten to
> it yet.

I'd done similar tests recently, for some batch import code, so it was
just a matter of recreating it.

>> separate inserts, no transaction: 21.21s
>> separate inserts, same transaction: 1.89s
>> 40 inserts, 100 rows/insert: 0.18s
>> one 40000-value insert: 0.16s
>> 40 prepared inserts, 100 rows/insert: 0.15s
>> COPY (text): 0.10s
>> COPY (binary): 0.10s
>
> What about 40000 individual prepared inserts?  Just curious about it.

40000 inserts, one prepared statement each (constructing the prepared
statement only once), in a single transaction: 1.68s

I'm surprised that there's any win here at all.

> Also, kind of suprised about COPY text vs. binary.  What was the data
> type in the table..?  If text, that makes sense, if it was an integer or
> something else, I'm kind of suprised.

Each row had one integer column.  I expect strings to be harder to
parse, since it's allocating buffers and parsing escapes, which is
usually more expensive than parsing an integer out of a string.  I'd
expect the difference to be negligible either way, though, and I'd be
interested in hearing about use cases where binary copying is enough
of a win to be worth the development cost of maintaining the feature.

On Wed, Apr 22, 2009 at 4:49 PM, Tom Lane <> wrote:
> Also, just to be clear: were the "40 insert" cases 40 separate
> transactions or one transaction?  I think the latter was meant but
> it's not 100% clear.

One transaction--the only case where I ran more than one transaction
was the first.

--
Glenn Maynard

От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 4:53 PM, James Mansion
<> wrote:
> And I'm disagreeing with that.  Single row is a given, but I think you'll
> find it pays to have one
> round trip if at all possible and invoking multiple prepared statements can
> work against this.

You're talking about round-trips to a *local* server, on the same
system, not a dedicated server with network round-trips, right?

Blocking round trips to another process on the same server should be
fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
connection) where the other side is listening for it; and then
blocking in return for the response.  The act of writing to an FD that
another process is waiting for will make the kernel mark the process
as "ready to wake up" immediately, and the act of blocking for the
response will kick the scheduler to some waiting process, so as long
as there isn't something else to compete for CPU for, each write/read
will wake up the other process instantly.  There's a task switching
cost, but that's too small to be relevant here.

Doing 1000000 local round trips, over a pipe: 5.25s (5 *microseconds*
each), code attached.  The cost *should* be essentially identical for
any local transport (pipes, named pipes, local TCP connections), since
the underlying scheduler mechanisms are the same.

That's not to say that batching inserts doesn't make a difference--it
clearly does, and it would probably be a much larger difference with
actual network round-trips--but round-trips to a local server aren't
inherently slow.  I'd be (casually) interested in knowing what the
actual costs are behind an SQL command round-trip (where the command
isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O
for things like constraint checks that need to be done before the
command can return success).

--
Glenn Maynard

От:
Stephen Frost
Дата:

* Glenn Maynard () wrote:
> >> separate inserts, no transaction: 21.21s
> >> separate inserts, same transaction: 1.89s
> >> 40 inserts, 100 rows/insert: 0.18s
> >> one 40000-value insert: 0.16s
> >> 40 prepared inserts, 100 rows/insert: 0.15s
> >> COPY (text): 0.10s
> >> COPY (binary): 0.10s
> >
> > What about 40000 individual prepared inserts?  Just curious about it.
>
> 40000 inserts, one prepared statement each (constructing the prepared
> statement only once), in a single transaction: 1.68s
>
> I'm surprised that there's any win here at all.

For a single column table, I wouldn't expect much either.  With more
columns I think it would be a larger improvement.

> Each row had one integer column.  I expect strings to be harder to
> parse, since it's allocating buffers and parsing escapes, which is
> usually more expensive than parsing an integer out of a string.  I'd
> expect the difference to be negligible either way, though, and I'd be
> interested in hearing about use cases where binary copying is enough
> of a win to be worth the development cost of maintaining the feature.

I've seen it help, but I was sending everything as binary (I figured,
once I'm doing it, might as well do it all), which included dates,
timestamps, IP addresses, integers, and some text.  It may have more of
an impact on dates and timestamps than on simple integers.

    Thanks!

        Stephen

От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost <> wrote:
> For a single column table, I wouldn't expect much either.  With more
> columns I think it would be a larger improvement.

Maybe.  I'm not sure why parsing "(1,2,3,4,5)" in an EXECUTE parameter
should be faster than parsing the exact same thing in an INSERT,
though.

> I've seen it help, but I was sending everything as binary (I figured,
> once I'm doing it, might as well do it all), which included dates,
> timestamps, IP addresses, integers, and some text.  It may have more of
> an impact on dates and timestamps than on simple integers.

Of course, you still need to get it in that format.  Be careful to
include any parsing you're doing to create the binary date in the
benchmarks.  Inevitably, at least part of the difference will be costs
simply moving from the psql process to your own.

--
Glenn Maynard

От:
david@lang.hm
Дата:

On Wed, 22 Apr 2009, Glenn Maynard wrote:

> On Wed, Apr 22, 2009 at 4:53 PM, James Mansion
> <> wrote:
>> And I'm disagreeing with that.  Single row is a given, but I think you'll
>> find it pays to have one
>> round trip if at all possible and invoking multiple prepared statements can
>> work against this.
>
> You're talking about round-trips to a *local* server, on the same
> system, not a dedicated server with network round-trips, right?

the use-case for a production setup for logging servers would probably
include a network hop.

David Lang

> Blocking round trips to another process on the same server should be
> fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
> connection) where the other side is listening for it; and then
> blocking in return for the response.  The act of writing to an FD that
> another process is waiting for will make the kernel mark the process
> as "ready to wake up" immediately, and the act of blocking for the
> response will kick the scheduler to some waiting process, so as long
> as there isn't something else to compete for CPU for, each write/read
> will wake up the other process instantly.  There's a task switching
> cost, but that's too small to be relevant here.
>
> Doing 1000000 local round trips, over a pipe: 5.25s (5 *microseconds*
> each), code attached.  The cost *should* be essentially identical for
> any local transport (pipes, named pipes, local TCP connections), since
> the underlying scheduler mechanisms are the same.
>
> That's not to say that batching inserts doesn't make a difference--it
> clearly does, and it would probably be a much larger difference with
> actual network round-trips--but round-trips to a local server aren't
> inherently slow.  I'd be (casually) interested in knowing what the
> actual costs are behind an SQL command round-trip (where the command
> isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O
> for things like constraint checks that need to be done before the
> command can return success).
>
>

От:
Stephen Frost
Дата:

* Glenn Maynard () wrote:
> On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost <> wrote:
> > For a single column table, I wouldn't expect much either.  With more
> > columns I think it would be a larger improvement.
>
> Maybe.  I'm not sure why parsing "(1,2,3,4,5)" in an EXECUTE parameter
> should be faster than parsing the exact same thing in an INSERT,
> though.

Erm..  Prepared queries is about using PQexecPrepared(), not about
sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
array of arguments.  That gets translated into a Bind command in the
protocol with a defined number of parameters and a length for each
parameter being passed.  That removes any need for scanning/parsing the
string sent to the backend.  That's the savings I'm referring to.

If you weren't using PQexecPrepared() (and using psql, you wouldn't
be..), then the difference you saw was more likely planning cost.

> Of course, you still need to get it in that format.  Be careful to
> include any parsing you're doing to create the binary date in the
> benchmarks.  Inevitably, at least part of the difference will be costs
> simply moving from the psql process to your own.

Sure.  What I recall from when I was working on this is that it wasn't
terribly hard to go from unix timestamps (epoch from 1970) to a PG
timestamp format (and there was nice example code in the backend) in
terms of CPU time.

    Thanks,

        Stephen

От:
Stephen Frost
Дата:

*  () wrote:
> On Wed, 22 Apr 2009, Glenn Maynard wrote:
>> You're talking about round-trips to a *local* server, on the same
>> system, not a dedicated server with network round-trips, right?
>
> the use-case for a production setup for logging servers would probably
> include a network hop.

Sure, but there's a big difference between a rtt of 0.041ms (my dinky
home network) and 20ms (from my home network in Virginia to Boston).  I
wasn't intending to discount latency, it can be a concen, but I doubt
it'll be 20ms for most people..

    Thanks,

        Stephen

От:
Glenn Maynard
Дата:

On Wed, Apr 22, 2009 at 9:48 PM, Stephen Frost <> wrote:
> Erm..  Prepared queries is about using PQexecPrepared(), not about
> sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
> array of arguments.  That gets translated into a Bind command in the
> protocol with a defined number of parameters and a length for each
> parameter being passed.  That removes any need for scanning/parsing the
> string sent to the backend.  That's the savings I'm referring to.

I'd suggest this be mentioned in the sql-prepare documentation, then,
because that documentation only discusses using prepared statements to
eliminate redundant planning costs.  (I'm sure it's mentioned in the
API docs and elsewhere, but if it's a major intended use of PREPARE,
the PREPARE documentation should make note of it.)

--
Glenn Maynard

От:
david@lang.hm
Дата:

On Wed, 22 Apr 2009, Stephen Frost wrote:

> * Glenn Maynard () wrote:
>> On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost <> wrote:
>>> For a single column table, I wouldn't expect much either.  With more
>>> columns I think it would be a larger improvement.
>>
>> Maybe.  I'm not sure why parsing "(1,2,3,4,5)" in an EXECUTE parameter
>> should be faster than parsing the exact same thing in an INSERT,
>> though.
>
> Erm..  Prepared queries is about using PQexecPrepared(), not about
> sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
> array of arguments.  That gets translated into a Bind command in the
> protocol with a defined number of parameters and a length for each
> parameter being passed.  That removes any need for scanning/parsing the
> string sent to the backend.  That's the savings I'm referring to.

are you sure? I thought that what goes out over the wire is always text.

David Lang

> If you weren't using PQexecPrepared() (and using psql, you wouldn't
> be..), then the difference you saw was more likely planning cost.
>
>> Of course, you still need to get it in that format.  Be careful to
>> include any parsing you're doing to create the binary date in the
>> benchmarks.  Inevitably, at least part of the difference will be costs
>> simply moving from the psql process to your own.
>
> Sure.  What I recall from when I was working on this is that it wasn't
> terribly hard to go from unix timestamps (epoch from 1970) to a PG
> timestamp format (and there was nice example code in the backend) in
> terms of CPU time.
>
>     Thanks,
>
>         Stephen
>

От:
Stephen Frost
Дата:

*  () wrote:
> On Wed, 22 Apr 2009, Stephen Frost wrote:
>> Erm..  Prepared queries is about using PQexecPrepared(), not about
>> sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
>> array of arguments.  That gets translated into a Bind command in the
>> protocol with a defined number of parameters and a length for each
>> parameter being passed.  That removes any need for scanning/parsing the
>> string sent to the backend.  That's the savings I'm referring to.
>
> are you sure? I thought that what goes out over the wire is always text.

Wow, why is there so much confusion and misunderstanding about this?

*psql* sends everything to the backend as text (except perhaps COPY
BINARY..  but that's because the user handles it), but if you're using
libpq, PQexecPrepared, and protocol 3.0 (any recent PG version), it's
going to use the Parse/Bind protocol-level commands.  To make it perhaps
more clear, here's a snippet from the libpq code for PQsendQueryGuts(),
which is the work-horse called by PQexecPrepared:

    /*
     * We will send Parse (if needed), Bind, Describe Portal, Execute, Sync,
     * using specified statement name and the unnamed portal.
     */
[...]

    /* Construct the Bind message */
    if (pqPutMsgStart('B', false, conn) < 0 ||
        pqPuts("", conn) < 0 ||
        pqPuts(stmtName, conn) < 0)
        goto sendFailed;

    /* Send parameter formats */
[...]
-- No param formats included, let the backend know
        if (pqPutInt(0, 2, conn) < 0)
            goto sendFailed;

-- Tell the backend the number of parameters to expect
    if (pqPutInt(nParams, 2, conn) < 0)
        goto sendFailed;

    /* Send parameters */
    for (i = 0; i < nParams; i++)
[...]
-- Pull the length from the caller-provided for each param
                    nbytes = paramLengths[i];
[...]
-- Send the length, then the param, over the wire
            if (pqPutInt(nbytes, 4, conn) < 0 ||
                pqPutnchar(paramValues[i], nbytes, conn) < 0)
                goto sendFailed;
[...]
-- All done, send finish indicator
    if (pqPutInt(1, 2, conn) < 0 ||
        pqPutInt(resultFormat, 2, conn))
        goto sendFailed;
    if (pqPutMsgEnd(conn) < 0)
        goto sendFailed;

    /* construct the Describe Portal message */
    if (pqPutMsgStart('D', false, conn) < 0 ||
        pqPutc('P', conn) < 0 ||
        pqPuts("", conn) < 0 ||
        pqPutMsgEnd(conn) < 0)
        goto sendFailed;

    /* construct the Execute message */
    if (pqPutMsgStart('E', false, conn) < 0 ||
        pqPuts("", conn) < 0 ||
        pqPutInt(0, 4, conn) < 0 ||
        pqPutMsgEnd(conn) < 0)
        goto sendFailed;

[...]
-- clear everything out
    if (pqFlush(conn) < 0)
        goto sendFailed;

Any other questions?

    Thanks,

        Stephen

От:
Stephen Frost
Дата:

* Glenn Maynard () wrote:
> I'd suggest this be mentioned in the sql-prepare documentation, then,
> because that documentation only discusses using prepared statements to
> eliminate redundant planning costs.  (I'm sure it's mentioned in the
> API docs and elsewhere, but if it's a major intended use of PREPARE,
> the PREPARE documentation should make note of it.)

Argh.  Perhaps the problem is that it's somewhat 'overloaded'.  PG
supports *both* SQL-level PREPARE/EXECUTE commands and the more
traditional (well, in my view anyway...) API/protocol of PQprepare() and
PQexecPrepared().  When using the API/protocol, you don't actually
explicitly call the SQL 'PREPARE blah AS INSERT INTO', you just call
PQprepare() with 'INSERT INTO blah VALUES ($1, $2, $3);' and then call
PQexecPrepared() later.

That's the reason it's not documented in the SQL-level PREPARE docs,
anyway.  I'm not against adding some kind of reference there, but it's
not quite the way you think it is..

    Thanks,

        Stephen

От:
Thomas Kellerer
Дата:

Stephen Frost wrote on 22.04.2009 23:51:
>>> What about 40000 individual prepared inserts?  Just curious about it.

>> 40000 inserts, one prepared statement each (constructing the prepared
>> statement only once), in a single transaction: 1.68s
>>
>> I'm surprised that there's any win here at all.
>
> For a single column table, I wouldn't expect much either.  With more
> columns I think it would be a larger improvement.

Out of curiosity I did some tests through JDBC.

Using a single-column (integer) table, re-using a prepared statement took about
7 seconds to insert 100000 rows with JDBC's batch interface and a batch size of 1000

Using a prepared statement that had a 1000 (?) after the insert (insert into foo
  values (?), (?), ...) the insert took about 0.8 seconds. Quite an improvement
I'd say.

Then I switched to a three column table (int, varchar(500), varchar(500)).

Insert using a preparedstatement with batch (first scenario) now was ~8.5
seconds, whereas the multi-value insert now took ~3 seconds. So the difference
got smaller, but still was quite substantial. This was inserting relatively
small strings (~20 characters) into the table

When increasing the size of the inserted strings, things began to change. When I
bumped the length of the strings to 70 and 200 characters, the multi-value
insert slowed down considerably. Both solutions now took around 9 seconds.

The multi-value solution ranged between 7 and 9 seconds, whereas the "regular"
insert syntax was pretty constant at roughly 9 seconds (I ran it about 15 times).

So it seems, that as the size of the row increases the multi-value insert loses
its head-start compared to the "regular" insert.

I also played around with batch size. Going beyond 200 didn't make a big
difference.

For the JDBC batch, the batch size was the number of rows after which I called
executeBatch() for the multi-value insert, this was the number of tuples I sent
in a single INSERT statement.

The multi-value statement seems to perform better with lower "batch" sizes
(~10-50) whereas the JDBC batching seems to be fastest with about 200 statements
per batch.


Thomas

От:
Kris Jurka
Дата:


On Thu, 23 Apr 2009, Thomas Kellerer wrote:

> Out of curiosity I did some tests through JDBC.
>
> Using a single-column (integer) table, re-using a prepared statement
> took about 7 seconds to insert 100000 rows with JDBC's batch interface
> and a batch size of 1000
>

As a note for non-JDBC users, the JDBC driver's batch interface allows
executing multiple statements in a single network roundtrip.  This is
something you can't get in libpq, so beware of this for comparison's sake.

> I also played around with batch size. Going beyond 200 didn't make a big
> difference.
>

Despite the size of the batch passed to the JDBC driver, the driver breaks
it up into internal sub-batch sizes of 256 to send to the server.  It does
this to avoid network deadlocks from sending too much data to the server
without reading any in return.  If the driver was written differently it
could handle this better and send the full batch size, but at the moment
that's not possible and we're hoping the gains beyond this size aren't too
large.

Kris Jurka

От:
Thomas
Дата:

Kris Jurka wrote on 26.04.2009 19:07:
> Despite the size of the batch passed to the JDBC driver, the driver
> breaks it up into internal sub-batch sizes of 256 to send to the
> server.  It does this to avoid network deadlocks from sending too much
> data to the server without reading any in return.  If the driver was
> written differently it could handle this better and send the full batch
> size, but at the moment that's not possible and we're hoping the gains
> beyond this size aren't too large.

Ah, thanks for the info.
I have seen this behaviour with other DBMS as well.

Going beyond ~200-500 doesn't improve the performance on Oracle or SQL Server as
well.

So I guess PG wouldn't really benefit that much from a different implementation
in the driver :)

Thomas




От:
Scott Marlowe
Дата:

On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka <> wrote:
>
>
> On Thu, 23 Apr 2009, Thomas Kellerer wrote:
>
>> Out of curiosity I did some tests through JDBC.
>>
>> Using a single-column (integer) table, re-using a prepared statement took
>> about 7 seconds to insert 100000 rows with JDBC's batch interface and a
>> batch size of 1000
>>
>
> As a note for non-JDBC users, the JDBC driver's batch interface allows
> executing multiple statements in a single network roundtrip.  This is
> something you can't get in libpq, so beware of this for comparison's sake.

Really?  I thought that executing statements like so:

select * from a;insert ...;delete;

in psql / libpq would execute them all in one trip.

От:
Kris Jurka
Дата:

Scott Marlowe wrote:
> On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka <> wrote:
>>
>> As a note for non-JDBC users, the JDBC driver's batch interface allows
>> executing multiple statements in a single network roundtrip.  This is
>> something you can't get in libpq, so beware of this for comparison's sake..
>
> Really?  I thought that executing statements like so:
>
> select * from a;insert ...;delete;
>
> in psql / libpq would execute them all in one trip.

Right, but those aren't prepared.  I suppose it's possible to issue a
prepare and then issue a batch of comma separated "execute" statements,
but that's not exactly a natural interface.

Kris Jurka

От:
Scott Marlowe
Дата:

On Mon, Apr 27, 2009 at 12:45 AM, Kris Jurka <> wrote:
> Scott Marlowe wrote:
>>
>> On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka <> wrote:
>>>
>>> As a note for non-JDBC users, the JDBC driver's batch interface allows
>>> executing multiple statements in a single network roundtrip.  This is
>>> something you can't get in libpq, so beware of this for comparison's
>>> sake..
>>
>> Really?  I thought that executing statements like so:
>>
>> select * from a;insert ...;delete;
>>
>> in psql / libpq would execute them all in one trip.
>
> Right, but those aren't prepared.  I suppose it's possible to issue a
> prepare and then issue a batch of comma separated "execute" statements, but
> that's not exactly a natural interface.

Oh right,.  Sorry, didn't realize you were talking about prepared statements.

От:
david@lang.hm
Дата:

On Sun, 26 Apr 2009, Kris Jurka wrote:

> Scott Marlowe wrote:
>> On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka <> wrote:
>>>
>>> As a note for non-JDBC users, the JDBC driver's batch interface allows
>>> executing multiple statements in a single network roundtrip.  This is
>>> something you can't get in libpq, so beware of this for comparison's
>>> sake..
>>
>> Really?  I thought that executing statements like so:
>>
>> select * from a;insert ...;delete;
>>
>> in psql / libpq would execute them all in one trip.
>
> Right, but those aren't prepared.  I suppose it's possible to issue a prepare
> and then issue a batch of comma separated "execute" statements, but that's
> not exactly a natural interface.

for the task we are discussing here (log inserting) why wouldn't it be
reasonable to have a prepared insert and then do begin;execute...;end to
do a batch of them at once.

David Lang

От:
PFC
Дата:

> Blocking round trips to another process on the same server should be
> fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
> connection) where the other side is listening for it; and then
> blocking in return for the response.  The act of writing to an FD that
> another process is waiting for will make the kernel mark the process
> as "ready to wake up" immediately, and the act of blocking for the
> response will kick the scheduler to some waiting process, so as long
> as there isn't something else to compete for CPU for, each write/read
> will wake up the other process instantly.  There's a task switching
> cost, but that's too small to be relevant here.
>
> Doing 1000000 local round trips, over a pipe: 5.25s (5 *microseconds*
> each), code attached.  The cost *should* be essentially identical for
> any local transport (pipes, named pipes, local TCP connections), since
> the underlying scheduler mechanisms are the same.

    Roundtrips can be quite fast but they have a hidden problem, which is
that everything gets serialized.
    This means if you have a process that generates data to insert, and a
postgres process, and 2 cores on your CPU, you will never use more than 1
core, because both are waiting on each other.
    Pipelining is a way to solve this...
    In the ideal case, if postgres is as fast as the data-generating process,
each would use 1 core, yielding 2x speedup.
    Of course if one of the processes is like 10x faster than the other, it
doesn't matter.


От:
david@lang.hm
Дата:

On Sat, 2 May 2009, PFC wrote:

>> Blocking round trips to another process on the same server should be
>> fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
>> connection) where the other side is listening for it; and then
>> blocking in return for the response.  The act of writing to an FD that
>> another process is waiting for will make the kernel mark the process
>> as "ready to wake up" immediately, and the act of blocking for the
>> response will kick the scheduler to some waiting process, so as long
>> as there isn't something else to compete for CPU for, each write/read
>> will wake up the other process instantly.  There's a task switching
>> cost, but that's too small to be relevant here.
>>
>> Doing 1000000 local round trips, over a pipe: 5.25s (5 *microseconds*
>> each), code attached.  The cost *should* be essentially identical for
>> any local transport (pipes, named pipes, local TCP connections), since
>> the underlying scheduler mechanisms are the same.
>
>     Roundtrips can be quite fast but they have a hidden problem, which is
> that everything gets serialized.
>     This means if you have a process that generates data to insert, and a
> postgres process, and 2 cores on your CPU, you will never use more than 1
> core, because both are waiting on each other.
>     Pipelining is a way to solve this...
>     In the ideal case, if postgres is as fast as the data-generating
> process, each would use 1 core, yielding 2x speedup.
>     Of course if one of the processes is like 10x faster than the other,
> it doesn't matter.

in the case of rsyslog there are config options to allow multiple
threads to be working on doing the inserts, so it doesn't need to be
serialized as badly as you are fearing (there is locking involved, so it
doesn't scale perfectly)

David Lang

От:
Glenn Maynard
Дата:

On Fri, May 1, 2009 at 8:29 PM, PFC <> wrote:
>        Roundtrips can be quite fast but they have a hidden problem, which is
> that everything gets serialized.

The client and server will serialize, but what usually matters most is
avoiding serializing against disk I/O--and that's why write-back
caching exists.  There's still a benefit to pipelining (not everything
the db might need to read to complete the write will always be in
cache), but if everything was being serialized it'd be an order of
magnitude worse.  That's why running each insert in a separate
transaction is so much slower; in that case, it *will* serialize
against the disk (by default).

--
Glenn Maynard