Re: performance for high-volume log insertion

Поиск
Список
Период
Сортировка
От david@lang.hm
Тема Re: performance for high-volume log insertion
Дата
Msg-id alpine.DEB.1.10.0904211111040.12662@asgard.lang.hm
обсуждение исходный текст
Ответ на Re: performance for high-volume log insertion  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: performance for high-volume log insertion  (david@lang.hm)
Re: performance for high-volume log insertion  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-performance
On Tue, 21 Apr 2009, Stephen Frost wrote:

> * david@lang.hm (david@lang.hm) wrote:
>> I think the key thing is that rsyslog today doesn't know anything about
>> SQL variables, it just creates a string that the user and the database
>> say looks like a SQL statement.
>
> err, what SQL variables?  You mean the $NUM stuff?  They're just
> placeholders..  You don't really need to *do* anything with them..  Or
> are you worried that users would provide something that would break as a
> prepared query?  If so, you just need to figure out how to handle that
> cleanly..
>
>> an added headache is that the rsyslog config does not have the concept of
>> arrays (the closest that it has is one special-case hack to let you
>> specify one variable multiple times)
>
> Argh.  The array I'm talking about is a C array, and has nothing to do
> with the actual config syntax..  I swear, I think you're making this
> more difficult by half.

not intentinally, but you may be right.

> Alright, looking at the documentation on rsyslog.com, I see something
> like:
>
> $template MySQLInsert,"insert iut, message, receivedat values
> ('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
> into systemevents\r\n", SQL
>
> Ignoring the fact that this is horrible, horrible non-SQL,

that example is for MySQL, nuff said ;-) or are you referring to the
modifiers that rsyslog has to manipulate the strings before inserting
them? (as opposed to using sql to manipulate the strings)

> I see that
> you use %blah% to define variables inside your string.  That's fine.
> There's no reason why you can't use this exact syntax to build a
> prepared query.  No user-impact changes are necessary.  Here's what you
> do:

<snip psudocode to replace %blah% with $num>

for some reason I was stuck on the idea of the config specifying the
statement and variables seperatly, so I wasn't thinking this way, however
there are headaches

doing this will require changes to the structure of rsyslog, today the
string manipulation is done before calling the output (database) module,
so all the database module currently gets is a string. in a (IMHO
misguided) attempt at security in a multi-threaded program, the output
modules are not given access to the full data, only to the distiled
result.

also, this approach won't work if the user wants to combine fixed text
with the variable into a column. an example of doing that would be to have
a filter to match specific lines, and then use a slightly different
template for those lines. I guess that could be done in SQL instead of in
the rsyslog string manipulation (i.e. instead of 'blah-%host%' do
'blah-'||'%host')

> As I mentioned before, the only obvious issue I
> see with doing this implicitly is that the user might want to put
> variables in places that you can't have variables in prepared queries.

this problem space would be anywhere except the column contents, right?

> You could deal with that by having the user indicate per template, using
> another template option, if the query can be prepared or not.  Another
> options is adding to your syntax something like '%*blah%' which would
> tell the system to pre-populate that variable before issuing PQprepare
> on the resultant string.  Of course, you might just use PQexecParams
> there, unless you want to be gung-ho and actually keep a hash around of
> prepared queries on the assumption that the variable the user gave you
> doesn't change very often (eg, '%*month%') and it's cheap to keep a
> small list of them around to use when they do match up.

rsyslog supports something similar for writing to disk where you can use
variables as part of the filename/path (referred to as 'dynafiles' in the
documentation). that's a little easier to deal with as the filename is
specified seperatly from the format of the data to write. If we end up
doing prepared statements I suspect they initially won't support variables
outside of the columns.

David Lang

В списке pgsql-performance по дате отправления:

Предыдущее
От: Kenneth Marshall
Дата:
Сообщение: Re: performance for high-volume log insertion
Следующее
От: david@lang.hm
Дата:
Сообщение: Re: performance for high-volume log insertion