Re: Help with bulk read performance

От: Andy Colson
Тема: Re: Help with bulk read performance
Дата: ,
Msg-id: 4D078CD7.7030308@squeakycode.net
(см: обсуждение, исходный текст)
Ответ на: Help with bulk read performance  (Dan Schaffer)
Ответы: Re: Help with bulk read performance  (Jim Nasby)
Список: pgsql-performance

Скрыть дерево обсуждения

Help with bulk read performance  (Dan Schaffer, )
 Re: Help with bulk read performance  (Jim Nasby, )
 Re: Help with bulk read performance  (Andy Colson, )
  Re: Help with bulk read performance  (Jim Nasby, )
   Re: Help with bulk read performance  (Andy Colson, )
    Re: Help with bulk read performance  (Nick Matheson, )
    Re: Help with bulk read performance  (Dan Schaffer, )
     Re: Help with bulk read performance  ("Pierre C", )
      Re: Help with bulk read performance  (Nick Matheson, )
 Re: Help with bulk read performance  (Jim Nasby, )
 Re: Help with bulk read performance  (Krzysztof Nienartowicz, )

On 11/1/2010 9:15 AM, Dan Schaffer wrote:
> Hello
>
> We have an application that needs to do bulk reads of ENTIRE Postgres
> tables very quickly (i.e. select * from table). We have observed that
> such sequential scans run two orders of magnitude slower than observed
> raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the
> storage overhead we have observed in Postgres. In the example below, it
> takes 1 GB to store 350 MB of nominal data. However that suggests we
> would expect to get 35 MB/s bulk read rates.
>
> Observations using iostat and top during these bulk reads suggest that
> the queries are CPU bound, not I/O bound. In fact, repeating the queries
> yields similar response times. Presumably if it were an I/O issue the
> response times would be much shorter the second time through with the
> benefit of caching.
>
> We have tried these simple queries using psql, JDBC, pl/java stored
> procedures, and libpq. In all cases the client code ran on the same box
> as the server.
> We have experimented with Postgres 8.1, 8.3 and 9.0.
>
> We also tried playing around with some of the server tuning parameters
> such as shared_buffers to no avail.
>
> Here is uname -a for a machine we have tested on:
>
> Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20
> 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>
> A sample dataset that reproduces these results looks like the following
> (there are no indexes):
>
> Table "bulk_performance.counts"
> Column | Type | Modifiers
> --------+---------+-----------
> i1 | integer |
> i2 | integer |
> i3 | integer |
> i4 | integer |
>
> There are 22 million rows in this case.
>
> We HAVE observed that summation queries run considerably faster. In this
> case,
>
> select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts
>
> runs at 35 MB/s.
>
>
> Our business logic does operations on the resulting data such that the
> output is several orders of magnitude smaller than the input. So we had
> hoped that by putting our business logic into stored procedures (and
> thus drastically reducing the amount of data flowing to the client) our
> throughput would go way up. This did not happen.
>
> So our questions are as follows:
>
> Is there any way using stored procedures (maybe C code that calls SPI
> directly) or some other approach to get close to the expected 35 MB/s
> doing these bulk reads? Or is this the price we have to pay for using
> SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and
> found it to perform quite well. However it does not measure up to
> Postgres in terms of replication, data interrogation, community support,
> acceptance, etc).
>
> Thanks
>
> Dan Schaffer
> Paul Hamer
> Nick Matheson
>
>
>
>

Whoa... Deja Vu

Is this the same thing Nick is working on?  How'd he get along?

http://archives.postgresql.org/message-id/


-Andy


В списке pgsql-performance по дате сообщения:

От: Dave Crooke
Дата:
Сообщение: Re: Index Bloat - how to tell?
От: Mark Kirkwood
Дата:
Сообщение: Re: Index Bloat - how to tell?