Re: performance for high-volume log insertion

От: david@lang.hm
Тема: Re: performance for high-volume log insertion
Дата: ,
Msg-id: alpine.DEB.1.10.0904211056200.12662@asgard.lang.hm
(см: обсуждение, исходный текст)
Ответ на: Re: performance for high-volume log insertion  (Greg Smith)
Ответы: Re: performance for high-volume log insertion  (Kenneth Marshall)
Re: performance for high-volume log insertion  (Greg Smith)
Список: pgsql-performance

Скрыть дерево обсуждения

performance for high-volume log insertion  (, )
 Re: performance for high-volume log insertion  (Stephen Frost, )
  Re: performance for high-volume log insertion  (, )
   Re: performance for high-volume log insertion  (Stephen Frost, )
    Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (Stephen Frost, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Stephen Frost, )
        Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Ben Chobot, )
        Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (, )
            Re: performance for high-volume log insertion  (, )
             Re: performance for high-volume log insertion  (Stephen Frost, )
            Re: performance for high-volume log insertion  (Stephen Frost, )
   Re: performance for high-volume log insertion  (Greg Smith, )
    Re: performance for high-volume log insertion  (Stephen Frost, )
     Re: performance for high-volume log insertion  (, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
    Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (Stephen Frost, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Richard Huxton, )
        Re: performance for high-volume log insertion  (Kenneth Marshall, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Kenneth Marshall, )
     Re: performance for high-volume log insertion  (Greg Smith, )
      Re: performance for high-volume log insertion  (, )
       Re: performance for high-volume log insertion  (Kenneth Marshall, )
       Re: performance for high-volume log insertion  (Greg Smith, )
   Re: performance for high-volume log insertion  (James Mansion, )
    Re: performance for high-volume log insertion  (Stephen Frost, )
     Re: performance for high-volume log insertion  (, )
      Re: performance for high-volume log insertion  (Robert Haas, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
       Re: performance for high-volume log insertion  (Glenn Maynard, )
        Re: performance for high-volume log insertion  (, )
         Re: performance for high-volume log insertion  (Glenn Maynard, )
        Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (Tom Lane, )
         Re: performance for high-volume log insertion  (Glenn Maynard, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (Glenn Maynard, )
            Re: performance for high-volume log insertion  (Stephen Frost, )
             Re: performance for high-volume log insertion  (Glenn Maynard, )
              Re: performance for high-volume log insertion  (Stephen Frost, )
             Re: performance for high-volume log insertion  (, )
              Re: performance for high-volume log insertion  (Stephen Frost, )
           Re: performance for high-volume log insertion  (Thomas Kellerer, )
            Re: performance for high-volume log insertion  (Kris Jurka, )
             Re: performance for high-volume log insertion  (Thomas, )
             Re: performance for high-volume log insertion  (Scott Marlowe, )
              Re: performance for high-volume log insertion  (Kris Jurka, )
               Re: performance for high-volume log insertion  (Scott Marlowe, )
               Re: performance for high-volume log insertion  (, )
     Re: performance for high-volume log insertion  (James Mansion, )
      Re: performance for high-volume log insertion  (Stephen Frost, )
       Re: performance for high-volume log insertion  (James Mansion, )
        Re: performance for high-volume log insertion  ("Joshua D. Drake", )
        Re: performance for high-volume log insertion  (Glenn Maynard, )
         Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Stephen Frost, )
         Re: performance for high-volume log insertion  (PFC, )
          Re: performance for high-volume log insertion  (, )
          Re: performance for high-volume log insertion  (Glenn Maynard, )
 Re: performance for high-volume log insertion  (Simon Riggs, )

On Tue, 21 Apr 2009, Greg Smith wrote:

> On Mon, 20 Apr 2009,  wrote:
>
>> while I fully understand the 'benchmark your situation' need, this isn't
>> that simple.  in this case we are trying to decide what API/interface to
>> use in a infrastructure tool that will be distributed in common distros
>> (it's now the default syslog package of debian and fedora), there are so
>> many variables in hardware, software, and load that trying to benchmark it
>> becomes effectivly impossible.
>
> From your later comments, you're wandering a bit outside of what you were
> asking about here.  Benchmarking the *query* side of things can be extremely
> complicated.  You have to worry about memory allocation, cold vs. warm cache,
> scale of database relative to RAM, etc.
>
> You were asking specifically about *insert* performance, which isn't nearly
> as complicated.  There are basically three setups:
>
> 1) Disk/controller has a proper write cache.  Writes and fsync will be fast.
> You can insert a few thousand individual transactions per second.
>
> 2) Disk/controller has a "lying" write cache.  Writes and fsync will be fast,
> but it's not safe for database use.  But since (1) is expensive and this one
> you can get for free jut by using a regular SATA drive with its write cache
> enabled, you can use this case as a proxy for approximately how (1) would
> act.  You'll still get a few thousand transactions per second, sustained
> writes may slow down relative to (1) if you insert enough that you hit a
> checkpoint (triggering lots of random I/O).
>
> 3) All write caches have been disabled because they were not battery-backed.
> This is the case if you have a regular SATA drive and you disable its write
> cache because you care about write durability.  You'll get a bit less than
> RPM/60 writes/second, so <120 inserts/second with a typical 7200RPM drive.
> Here batching multiple INSERTs together is critical to get any sort of
> reasonable performance.

in case #1 would you expect to get significant gains from batching?
doesn't it suffer from problems similar to #2 when checkpoints hit?

> In (3), I'd expect that trivia like INSERT vs. COPY and COPY BINARY vs. COPY
> TEXT would be overwhelmed by the overhead of the commit itself. Therefore you
> probably want to test with case (2) instead, as it doesn't require any
> additional hardware but has similar performance to a production-worthy (1).
> All of the other things you're worried about really don't matter here; you
> can get an approximate measure of what the performance of the various
> INSERT/COPY schemes are that is somewhat platform dependant, but the results
> should be good enough to give you some rule of thumb suggestions for whether
> optimizations are significant enough to justify the coding effort to
> implement them or not.

I'll see about setting up a test in the next day or so. should I be able
to script this through psql? or do I need to write a C program to test
this?

> I'm not sure whether you're familiar with all the fsync trivia here.  In
> normal syslog use, there's an fsync call after every write.  You can disable
> that by placing a "-" before the file name in /etc/syslog.conf The thing that
> is going to make database-based writes very different is that syslog's
> fsync'd writes are unlikely to leave you in a bad state if the drive lies
> about them, while database writes can.  So someone using syslog on a standard
> SATA drive isn't getting the write guarantee they think they are, but the
> downside on a crash is minimal.  If you've got a high-volume syslog
> environment (>100 lines/second), you can't support those as individual
> database writes unless you've got a battery-backed write controller.  A
> regular disk just can't process genuine fsync calls any faster than that.  A
> serious syslog deployment that turns fsync on and expects it to really do its
> thing is already exposed to this issue though. I think it may be a the case
> that a lot of people think they have durable writes in their configuration
> but really don't.

rsyslog is a little different, instead of just input -> disk it does input
-> queue -> output (where output can be many things, including disk or
database)

it's default is to use memory-based queues (and no fsync), but has config
options to do disk based queues with a fsync after each update

David Lang


В списке pgsql-performance по дате сообщения:

От: Greg Smith
Дата:
Сообщение: Re: performance for high-volume log insertion
От: david@lang.hm
Дата:
Сообщение: Re: performance for high-volume log insertion