Re: Help with bulk read performance

От: Dan Schaffer
Тема: Re: Help with bulk read performance
Дата: ,
Msg-id: 4D0921D2.2030806@noaa.gov
(см: обсуждение, исходный текст)
Ответ на: Re: Help with bulk read performance  (Andy Colson)
Ответы: Re: Help with bulk read performance  ("Pierre C")
Список: pgsql-performance

Скрыть дерево обсуждения

Help with bulk read performance  (Dan Schaffer, )
 Re: Help with bulk read performance  (Jim Nasby, )
 Re: Help with bulk read performance  (Andy Colson, )
  Re: Help with bulk read performance  (Jim Nasby, )
   Re: Help with bulk read performance  (Andy Colson, )
    Re: Help with bulk read performance  (Nick Matheson, )
    Re: Help with bulk read performance  (Dan Schaffer, )
     Re: Help with bulk read performance  ("Pierre C", )
      Re: Help with bulk read performance  (Nick Matheson, )
 Re: Help with bulk read performance  (Jim Nasby, )
 Re: Help with bulk read performance  (Krzysztof Nienartowicz, )

Hi,
My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had
meblacklisted  
for awhile for some reason).


Thank you for all of the suggestions.  We were able to improve out bulk read performance from 3 MB/s to 60 MB/s
(assumingthe data  
are NOT in cache in both cases) by doing the following:

1. Storing the data in a "bytea" column instead of an "array" column.
2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream)  method

The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and client,

respectively.  The server packing occurs when the retrieved data are packed into a bytestream for sending across the
network. 
Storing the data as a simple byte array reduces this time substantially.  The client-side unpacking time is spent
generatinga  
ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this time became close
tonegligible. 

The only downside of storing the data in byte arrays is the loss of transparency.  That is, a simple "select *" of a
fewrows shows  
bytes instead of floats.  We hope to mitigate this by writing a simple stored procedures that unpacks the bytes into
floats.

A couple of other results:

If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of
magnitudeand the  
observed total throughput is 25 MB/s.  If the data are stored in a Postgres float array and unpacked into a byte
stream,the  
observed throughput is 20 MB/s.

Dan (and Nick)

Andy Colson wrote:
> On 12/14/2010 9:41 AM, Jim Nasby wrote:
>> On Dec 14, 2010, at 9:27 AM, Andy Colson wrote:
>>> Is this the same thing Nick is working on?  How'd he get along?
>>>
>>> http://archives.postgresql.org/message-id/
>>
>> So it is. The one I replied to stood out because no one had replied to
>> it; I didn't see the earlier email.
>> --
>> Jim C. Nasby, Database Architect                   
>> 512.569.9461 (cell)                         http://jim.nasby.net
>>
>>
>>
>
> Oh.. I didn't even notice the date... I thought it was a new post.
>
> But still... (and I'll cc Nick on this)  I'd love to hear an update on
> how this worked out.
>
> Did you get it to go fast?  What'd you use?  Did the project go over
> budget and did you all get fired?  COME ON MAN!  We need to know! :-)
>
> -Andy
>

Вложения

В списке pgsql-performance по дате сообщения:

От: Eric Comeau
Дата:
Сообщение: Re: How to get FK to use new index without restarting the database
От: Krzysztof Nienartowicz
Дата:
Сообщение: Re: Help with bulk read performance