Re: Help with bulk read performance

Поиск
Список
Период
Сортировка
От Krzysztof Nienartowicz
Тема Re: Help with bulk read performance
Дата
Msg-id 8B1B00CC-D1A3-46B6-90B3-C862900F8DAB@gmail.com
обсуждение исходный текст
Ответ на Help with bulk read performance  (Dan Schaffer <Daniel.S.Schaffer@noaa.gov>)
Список pgsql-performance
Hello Daniel,
We have the same scenario for the native Java arrays, so we are storing bytea and doing conversion at the client side,
butfor the server side SQL,  plJava comes very handy: 

No sure how you want to create stored procedures to convert internally but this is how we do this:

One has to define conversion routines in Java then deploy them to plJava. Scanning though this field would be still CPU
bound,around 2x slower than with native arrays and 6x slower than with blobs,  but at least one has this ability. It's
evenpossible to pass them to plR to do some statistical processing directly, so depending on the operations you do it
maybe still cheaper then streaming out  over the wire to the regular JDBC client. 

1. deploy class like this within plJava (null handling left out for brevity)

import java.io.File;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

public class myArrayConversion
{

    public myArrayConversion() {}

    /** Serialize double array to blob */
    public static byte[] convertDoubleArrayToBytea(double[] obj) throws IOException {
     ByteArrayOutputStream baos =  new ByteArrayOutputStream();
     ObjectOutputStream oos = new ObjectOutputStream(baos);
     oos.writeObject(obj);
     return baos.toByteArray();
    }
    /** Serialize int array to blob */
    public static byte[] convertIntToBytea(int[] obj) throws IOException {
         ByteArrayOutputStream baos =  new ByteArrayOutputStream();
         ObjectOutputStream oos = new ObjectOutputStream(baos);
         oos.writeObject(obj);
         return baos.toByteArray();
    }

    /** Deserialize blob to double array */
    public static double[] convertToDoubleArray(byte[] obj) throws IOException,
                                                             ClassNotFoundException {
        // Deserialize from a byte array
      ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj));
      return (double[])ios.readObject();
    }

    /** Deserialize blob to it array */
    public static int[] convertIntToArray(byte[] obj) throws IOException,
                                                             ClassNotFoundException {
        // Deserialize from a byte array
               ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj));
               return (int[])ios.readObject();
    }


// other types arrays streaming...
//...
}

2. then create a mapping functions as a db owner:

<sql>
CREATE OR REPLACE FUNCTION public.convertDoubleArrayToBytea(double precision[])
  RETURNS bytea AS
    'mappingPkg.convertDoubleArrayToBytea(double[])'
  LANGUAGE 'javau' IMMUTABLE
  COST 50;

GRANT EXECUTE ON FUNCTION public.convertDoubleArrayToBytea(double precision[]) TO public;



CREATE OR REPLACE FUNCTION public.convertToDoubleArray(bytea)
  RETURNS double precision[] AS
    'mappingPkg.convertToDoubleArray(byte[])'
  LANGUAGE 'javau' IMMUTABLE
  COST 50;

GRANT EXECUTE ON FUNCTION public.convertToDoubleArray(bytea) TO public;
</sql>


then you can have conversion either way:

select convertToDoubleArray(convertDoubleArrayToBytea(array[i::float8,1.1,100.1,i*0.1]::float8[])) from
generate_series(1,100)i; 

so you'd be also able to create bytea objects from native SQL arrays within SQL.

PLJava seems to be enjoying revival last days thanks to Johann 'Myrkraverk' Oskarsson who fixed several long-standing
bugs.Check out the plJava list for details. 


    Krzysztof



On Dec 16, 2010, at 10:22 AM, pgsql-performance-owner@postgresql.org wrote:

> From: Dan Schaffer <Daniel.S.Schaffer@noaa.gov>
> Date: December 15, 2010 9:15:14 PM GMT+01:00
> To: Andy Colson <andy@squeakycode.net>
> Cc: Jim Nasby <jim@nasby.net>, pgsql-performance@postgresql.org, Nick Matheson <Nick.D.Matheson@noaa.gov>
> Subject: Re: Help with bulk read performance
> Reply-To: Daniel.S.Schaffer@noaa.gov
>
>
> Hi,
> My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had
meblacklisted for awhile for some reason). 
>
>
> Thank you for all of the suggestions.  We were able to improve out bulk read performance from 3 MB/s to 60 MB/s
(assumingthe data are NOT in cache in both cases) by doing the following: 
>
> 1. Storing the data in a "bytea" column instead of an "array" column.
> 2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream)  method
>
> The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and
client,respectively.  The server packing occurs when the retrieved data are packed into a bytestream for sending across
thenetwork. Storing the data as a simple byte array reduces this time substantially.  The client-side unpacking time is
spentgenerating a ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this
timebecame close to negligible. 
>
> The only downside of storing the data in byte arrays is the loss of transparency.  That is, a simple "select *" of a
fewrows shows bytes instead of floats.  We hope to mitigate this by writing a simple stored procedures that unpacks the
bytesinto floats. 
>
> A couple of other results:
>
> If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of
magnitudeand the observed total throughput is 25 MB/s.  If the data are stored in a Postgres float array and unpacked
intoa byte stream, the observed throughput is 20 MB/s. 
>
> Dan (and Nick)


В списке pgsql-performance по дате отправления:

Предыдущее
От: Eric Comeau
Дата:
Сообщение: Re: How to get FK to use new index without restarting the database
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: performance libpq vs JDBC