Re: performance for high-volume log insertion

От: Kenneth Marshall
Тема: Re: performance for high-volume log insertion
Дата: ,
Msg-id: 20090421154458.GD18845@it.is.rice.edu
(см: обсуждение, исходный текст)
Ответ на: Re: performance for high-volume log insertion  ()
Список: pgsql-performance

Скрыть дерево обсуждения

performance for high-volume log insertion  (, )
 Re: performance for high-volume log insertion  (Stephen Frost, )
  Re: performance for high-volume log insertion  (, )
   Re: performance for high-volume log insertion  (Stephen Frost, )
    Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (Stephen Frost, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Stephen Frost, )
        Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Ben Chobot, )
        Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (, )
            Re: performance for high-volume log insertion  (, )
             Re: performance for high-volume log insertion  (Stephen Frost, )
            Re: performance for high-volume log insertion  (Stephen Frost, )
   Re: performance for high-volume log insertion  (Greg Smith, )
    Re: performance for high-volume log insertion  (Stephen Frost, )
     Re: performance for high-volume log insertion  (, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
    Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (Stephen Frost, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Richard Huxton, )
        Re: performance for high-volume log insertion  (Kenneth Marshall, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Kenneth Marshall, )
     Re: performance for high-volume log insertion  (Greg Smith, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Kenneth Marshall, )
       Re: performance for high-volume log insertion  (Greg Smith, )
   Re: performance for high-volume log insertion  (James Mansion, )
    Re: performance for high-volume log insertion  (Stephen Frost, )
     Re: performance for high-volume log insertion  (, )
      Re: performance for high-volume log insertion  (Robert Haas, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
       Re: performance for high-volume log insertion  (Glenn Maynard, )
        Re: performance for high-volume log insertion  (, )
         Re: performance for high-volume log insertion  (Glenn Maynard, )
        Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (Tom Lane, )
         Re: performance for high-volume log insertion  (Glenn Maynard, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (Glenn Maynard, )
            Re: performance for high-volume log insertion  (Stephen Frost, )
             Re: performance for high-volume log insertion  (Glenn Maynard, )
              Re: performance for high-volume log insertion  (Stephen Frost, )
             Re: performance for high-volume log insertion  (, )
              Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (Thomas Kellerer, )
            Re: performance for high-volume log insertion  (Kris Jurka, )
             Re: performance for high-volume log insertion  (Thomas, )
             Re: performance for high-volume log insertion  (Scott Marlowe, )
              Re: performance for high-volume log insertion  (Kris Jurka, )
               Re: performance for high-volume log insertion  (Scott Marlowe, )
               Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (James Mansion, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
       Re: performance for high-volume log insertion  (James Mansion, )
        Re: performance for high-volume log insertion  ("Joshua D. Drake", )
        Re: performance for high-volume log insertion  (Glenn Maynard, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (PFC, )
          Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Glenn Maynard, )
 Re: performance for high-volume log insertion  (Simon Riggs, )

On Tue, Apr 21, 2009 at 08:37:54AM -0700,  wrote:
> Kenneth,
>   could you join the discussion on the rsyslog mailing list?
> rsyslog-users <>
>
> I'm surprised to hear you say that rsyslog can already do batch inserts and
> am interested in how you did that.
>
> what sort of insert rate did you mange to get?
>
> David Lang
>
David,

I would be happy to join the discussion. I did not mean to say
that rsyslog currently supported batch inserts, just that the
pieces that provide "stand-by queuing" could be used to manage
batching inserts.

Cheers,
Ken

> On Tue, 21 Apr 2009, Kenneth Marshall wrote:
>
>> Date: Tue, 21 Apr 2009 08:33:30 -0500
>> From: Kenneth Marshall <>
>> To: Richard Huxton <>
>> Cc: , Stephen Frost <>,
>>     Greg Smith <>, 
>> Subject: Re: [PERFORM] performance for high-volume log insertion
>> Hi,
>>
>> I just finished reading this thread. We are currently working on
>> setting up a central log system using rsyslog and PostgreSQL. It
>> works well once we patched the memory leak. We also looked at what
>> could be done to improve the efficiency of the DB interface. On the
>> rsyslog side, moving to prepared queries allows you to remove the
>> escaping that needs to be done currently before attempting to
>> insert the data into the SQL backend as well as removing the parsing
>> and planning time from the insert. This is a big win for high insert
>> rates, which is what we are talking about. The escaping process is
>> also a big CPU user in rsyslog which then hands the escaped string
>> to the backend which then has to undo everything that had been done
>> and parse/plan the resulting query. This can use a surprising amount
>> of additional CPU. Even if you cannot support a general prepared
>> query interface, by specifying what the query should look like you
>> can handle much of the low-hanging fruit query-wise.
>>
>> We are currently using a date based trigger to use a new partition
>> each day and keep 2 months of logs currently. This can be usefully
>> managed on the backend database, but if rsyslog supported changing
>> the insert to the new table on a time basis, the CPU used by the
>> trigger to support this on the backend could be reclaimed. This
>> would be a win for any DB backend. As you move to the new partition,
>> issuing a truncate to clear the table would simplify the DB interfaces.
>>
>> Another performance enhancement already mentioned, would be to
>> allow certain extra fields in the DB to be automatically populated
>> as a function of the log messages. For example, logging the mail queue
>> id for messages from mail systems would make it much easier to locate
>> particular mail transactions in large amounts of data.
>>
>> To sum up, eliminating the escaping in rsyslog through the use of
>> prepared queries would reduce the CPU load on the DB backend. Batching
>> the inserts will also net you a big performance increase. Some DB-based
>> applications allow for the specification of several types of queries,
>> one for single inserts and then a second to support multiple inserts
>> (copy). Rsyslog already supports the queuing pieces to allow you to
>> batch inserts. Just some ideas.
>>
>> Regards,
>> Ken
>>
>>
>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote:
>>>  wrote:
>>>> On Tue, 21 Apr 2009, Stephen Frost wrote:
>>>>> *  () wrote:
>>>>>> while I fully understand the 'benchmark your situation' need, this
>>>>>> isn't
>>>>>> that simple.
>>>>>
>>>>> It really is.  You know your application, you know it's primary use
>>>>> cases, and probably have some data to play with.  You're certainly in a
>>>>> much better situation to at least *try* and benchmark it than we are.
>>>> rsyslog is a syslog server. it replaces (or for debian and fedora, has
>>>> replaced) your standard syslog daemon. it recieves log messages from
>>>> every
>>>> app on your system (and possibly others), filters, maniulates them, and
>>>> then stores them somewhere. among the places that it can store the logs
>>>> are database servers (native support for MySQL, PostgreSQL, and Oracle.
>>>> plus libdbi for others)
>>>
>>> Well, from a performance standpoint the obvious things to do are:
>>> 1. Keep a connection open, do NOT reconnect for each log-statement
>>> 2. Batch log statements together where possible
>>> 3. Use prepared statements
>>> 4. Partition the tables by day/week/month/year (configurable I suppose)
>>>
>>> The first two are vital, the third takes you a step further. The fourth
>>> is
>>> a long-term admin thing.
>>>
>>> And possibly
>>> 5. Have two connections, one for fatal/error etc and one for info/debug
>>> level log statements (configurable split?). Then you can use the
>>> synchronous_commit setting on the less important ones. Might buy you some
>>> performance on a busy system.
>>>
>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
>>>
>>>> other apps then search and report on the data after it is stored. what
>>>> apps?, I don't know either. pick your favorite reporting tool and you'll
>>>> be a step ahead of me (I don't know a really good reporting tool)
>>>> as for sample data, you have syslog messages, just like I do. so you
>>>> have
>>>> the same access to data that I have.
>>>> how would you want to query them? how would people far less experianced
>>>> that you want to query them?
>>>> I can speculate that some people would do two columns (time, everything
>>>> else), others will do three (time, server, everything else), and others
>>>> will go further (I know some who would like to extract IP addresses
>>>> embedded in a message into their own column). some people will index on
>>>> the time and host, others will want to do full-text searches of
>>>> everything.
>>>
>>> Well, assuming it looks much like traditional syslog, I would do
>>> something
>>> like: (timestamp, host, facility, priority, message). It's easy enough to
>>> stitch back together if people want that.
>>>
>>> PostgreSQL's full-text indexing is quite well suited to logfiles I'd have
>>> thought, since it knows about filenames, urls etc already.
>>>
>>> If you want to get fancy, add a msg_type column and one subsidiary table
>>> for each msg_type. So - you might have smtp_connect_from (hostname,
>>> ip_addr). A set of perl regexps can match and extract the fields for
>>> these
>>> extra tables, or you could do it with triggers inside the database. I
>>> think
>>> it makes sense to do it in the application. Easier for users to
>>> contribute
>>> new patterns/extractions. Meanwhile, the core table is untouched so you
>>> don't *need* to know about these extra tables.
>>>
>>> If you have subsidiary tables, you'll want to partition those too and
>>> perhaps stick them in their own schema (logs200901, logs200902 etc).
>>>
>>> --
>>>   Richard Huxton
>>>   Archonet Ltd
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> ()
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: WHERE condition not being pushed down to union parts
От: Greg Smith
Дата:
Сообщение: Re: performance for high-volume log insertion