Обсуждение: bytea performance issue

Поиск
Список
Период
Сортировка

bytea performance issue

От
brian stone
Дата:
I have to store binary data in a table, ranging from 512K - 1M.  I am getting very poor performance when inserting this data.

create table my_stuff (data bytea);

I then try to insert 10 1M blobs into this table using PQexecParams from C.  It takes ~10 seconds to insert the 10 records.

The test is being performed locally so this is not a network issue.  If I change the data I send from the client to 1M worth of text, and change the 'my_stuff.data' to TEXT, the same test takes ~2 seconds.  Has anyone else seen this performance issue with bytea?

I looked at large objects but that is limited to an Oid and I will need more than 4 billion unique identifiers.  Maybe not 4 billion objects at once, but I will burn through them quickly.

I looked at the byteain and byteaout functions in the postgresql source and there is quite a bit of processing going on - escape stuff.  Although, this is for text-to-internal correct?  If I use PGexecParams setting the format to binary, the backend should use the recv/send functions ... right?

I don't need this escaping; this information never needs to be in text form.  Is it possible to create a user-defined type that's only binary?  It looks like I am forced to defined an input and output function.  The input function takes a CString.

Any other solutions for getting binary data into postgresql?  Suggestions?

skye


Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.

Re: bytea performance issue

От
Tom Lane
Дата:
brian stone <skye0507@yahoo.com> writes:
> I have to store binary data in a table, ranging from 512K - 1M.  I am getting very poor performance when inserting
thisdata. 
> create table my_stuff (data bytea);
> I then try to insert 10 1M blobs into this table using PQexecParams from C.  It takes ~10 seconds to insert the 10
records. 
> The test is being performed locally so this is not a network issue.  If I change the data I send from the client to
1Mworth of text, and change the 'my_stuff.data' to TEXT, the same test takes ~2 seconds.  Has anyone else seen this
performanceissue with bytea? 

How are you transmitting the data exactly?  Have you tried using
oprofile or some such to identify the culprit?

It does sound like escaping could be the issue, except that if you're
sending binary parameters as your message suggests (but doesn't actually
say) then there shouldn't be any escape processing going on.

            regards, tom lane

Re: bytea performance issue

От
brian stone
Дата:
I have not tried profiling yet; I am no pro at that.

output of "SELECT version()"
PostgreSQL 8.2rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)

This is the test program.  I run it on the same machine as the postmaster.  I am not sure, but I would assume that uses unix sockets rather than tcp.

// CREATE TABLE my_stuff (data bytea);

int main(void)
{
  int i;
  PGconn *conn;
  unsigned char *data[1];
  int datal = 1024*1024;
  int data_format = 1;
  PGresult *res;

  conn = PQsetdb(NULL, NULL, NULL, NULL, "testdb");
  if(!conn)
  {
    printf("failed to connect to 'testdb'\n");
    return 1;
  }

  data[0] = (unsigned char *)malloc(datal);
  for(i=0; i < 10; i++)
  {
    res = PQexecParams(
      conn,
      "INSERT INTO my_stuff (data) VALUES ($1)",
      1,
      NULL,
      (const char * const *)data,
      (const int *)&datal,
      (const int *)&data_format,
      1);

    if(res)
    {
      printf("%s\n", PQresultErrorMessage(res));
      PQclear(res);
    }
  }

  PQfinish(conn);
  return 0;
}

gcc -I/usr/local/pgsql/include -o bytea_test bytea_test.c -lpq -lcrypt

Output of - `time ./bytea_test`
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:

real    0m9.300s
user    0m0.013s
sys     0m0.010s

Thanks,
skye


Tom Lane <tgl@sss.pgh.pa.us> wrote:
brian stone writes:
> I have to store binary data in a table, ranging from 512K - 1M. I am getting very poor performance when inserting this data.
> create table my_stuff (data bytea);
> I then try to insert 10 1M blobs into this table using PQexecParams from C. It takes ~10 seconds to insert the 10 records.
> The test is being performed locally so this is not a network issue. If I change the data I send from the client to 1M worth of text, and change the 'my_stuff.data' to TEXT, the same test takes ~2 seconds. Has anyone else seen this performance issue with bytea?

How are you transmitting the data exactly? Have you tried using
oprofile or some such to identify the culprit?

It does sound like escaping could be the issue, except that if you're
sending binary parameters as your message suggests (but doesn't actually
say) then there shouldn't be any escape processing going on.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


8:00? 8:25? 8:40? Find a flick in no time
with theYahoo! Search movie showtime shortcut.

Re: bytea performance issue

От
brian stone
Дата:
So there is no confusion as to why my output has 10 lines that say "Error:", the pg error printf line should read:

printf("Error: %s\n", PQresultErrorMessage(res));

skye


brian stone <skye0507@yahoo.com> wrote:
I have not tried profiling yet; I am no pro at that.

output of "SELECT version()"
PostgreSQL 8.2rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)

This is the test program.  I run it on the same machine as the postmaster.  I am not sure, but I would assume that uses unix sockets rather than tcp.

// CREATE TABLE my_stuff (data bytea);

int main(void)
{
  int i;
  PGconn *conn;
  unsigned char *data[1];
  int datal = 1024*1024;
  int data_format = 1;
  PGresult *res;

  conn = PQsetdb(NULL, NULL, NULL, NULL, "testdb");
  if(!conn)
  {
    printf("failed to connect to 'testdb'\n");
    return 1;
  }

  data[0] = (unsigned char *)malloc(datal);
  for(i=0; i < 10; i++)
  {
    res = PQexecParams(
      conn,
      "INSERT INTO my_stuff (data) VALUES ($1)",
      1,
      NULL,
      (const char * const *)data,
      (const int *)&datal,
      (const int *)&data_format,
      1);

    if(res)
    {
      printf("%s\n", PQresultErrorMessage(res));
      PQclear(res);
    }
  }

  PQfinish(conn);
  return 0;
}

gcc -I/usr/local/pgsql/include -o bytea_test bytea_test.c -lpq -lcrypt

Output of - `time ./bytea_test`
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:
Error:

real    0m9.300s
user    0m0.013s
sys     0m0.010s

Thanks,
skye


Tom Lane <tgl@sss.pgh.pa.us> wrote:
brian stone writes:
> I have to store binary data in a table, ranging from 512K - 1M. I am getting very poor performance when inserting this data.
> create table my_stuff (data bytea);
> I then try to insert 10 1M blobs into this table using PQexecParams from C. It takes ~10 seconds to insert the 10 records.
> The test is being performed locally so this is not a network issue. If I change the data I send from the client to 1M worth of text, and change the 'my_stuff.data' to TEXT, the same test takes ~2 seconds. Has anyone else seen this performance issue with bytea?

How are you transmitting the data exactly? Have you tried using
oprofile or some such to identify the culprit?

It does sound like escaping could be the issue, except that if you're
sending binary parameters as your message suggests (but doesn't actually
say) then there shouldn't be any escape processing going on.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


8:00? 8:25? 8:40? Find a flick in no time
with theYahoo! Search movie showtime shortcut.


Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.