Inefficiencies in COPY command

Поиск
Список
Период
Сортировка
От Wayne Piekarski
Тема Inefficiencies in COPY command
Дата
Msg-id 199908070759.RAA21484@helpdesk.senet.com.au
обсуждение исходный текст
Ответы Re: [HACKERS] Inefficiencies in COPY command  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

With my system I get it to do a full pg_dump once per hour so that if
anything goes wrong I can go back to the previous hours information if
needed. One thing I did notice is that the whole process of the COPY
command is highly CPU bound, the postgres will use up all available CPU
and the disks are relatively idle.

I ran some profiling on it with gprof, and I was shocked to find millions
of calls to functions like memcpy, pq_putbytes, CopySendChar. I had a look
at the code and most of these were just wrappers to various other
functions, and did no work of their own. Having all these functions
running a couple of million times and making heaps of their own calls
meant that it was spending most of its time pushing and pulling things off
the stack instead of doing real work :)

So I looked into the problem, and CopyTo was getting the data, 
and calling CopyAttributeOut to convert it to a string and send it to the
client. The CopyTo function was getting called rarely so this was a good
place to start.

Turns out that the CopyAttributeOut function gets the supplied string, and
escapes it out, and as it does it, it calls CopySendChar for each
character to put it in the buffer for sending. This function does a lot of
other function calls, including a memcpy or two.

So I made some changes to CopyAttributeOut so that it escapes the string
initially into a temporary buffer (allocated onto the stack) and then
feeds the whole string to the CopySendData which is a lot more efficient
because it can blast the whole string in one go, saving about 1/3 to 1/4
the number of memcpy and so on.

This modification caused a copy (with profiling and everything else) to go
from 1m14sec to 45 sec, which I was very happy with considering how easy
it was to fix :)


I kept hacking at the code though, and found another slow point in the
code which was int4out. It was a one line front end to ltoa, then this
called sprintf with just "%d", and then numerous calls to other internal
libc functions. This was very expensive and wasted lots of CPU time.

So, I rewrote int4out (throwing away all the excess code) to do this
conversion directly without calling any libc code which is more generic
and slower, and this too gained a speed improvement. Combined with my
previous patch, with profiling and all that turned off, and -O3, I managed
to get another COPY (on a different table - sorry) down from 30 seconds to
15 seconds, so I've managed to double its speed without any tradeoffs
whatsoever!

I have included a patch below for the changes I made, although I would
only use this to look at, it is shocking code which was a real hack job
using #defines and a few other tricks to make it work so I could test this
idea out quickly. The int4out example is especially bad, and has lots of
pointers and stuff flying around and may have some bugs.

But, what I think could be interesting to improve our performance is to
make simple mods to CopyAttributeOut (along the lines of my changes) - and
to also make changes to remove all use of sprintf when numbers and floats
are being converted into strings. I would imagine this would generally 
speed up SELECT and COPY commands, so the performance increase can be
gained in other places too.

If someone wants, I can do cleaner versions of this patch for inclusion in
the code, or one of the developers may want to do this instead in case
theres something I'm missing. But I wanted to share this with everyone to
help increase Postgres' speed as it is quite significant!

[Patch included below for reference - note that this patch probably has
bugs and use it at your own risk! I am not using this patch except for
testing]

Regards,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au



diff -u -r NORMAL/postgresql-6.5/src/backend/commands/copy.c PROFILING/postgresql-6.5/src/backend/commands/copy.c
--- NORMAL/postgresql-6.5/src/backend/commands/copy.c    Mon Jun 14 10:33:40 1999
+++ PROFILING/postgresql-6.5/src/backend/commands/copy.c    Sat Aug  7 16:02:50 1999
@@ -1278,12 +1278,23 @@#endif}
+
+
+#warning Wayne put hacks here
+
+#define XCopySendChar(ch, fp) __buffer[__buffer_ofs++] = ch
+
+
+static voidCopyAttributeOut(FILE *fp, char *server_string, char *delim, int is_array){    char       *string;    char
     c;
 
+    char __buffer [(strlen (server_string) * 2) + 4]; /* Use +4 for safety */
+    int __buffer_ofs = 0;
+#ifdef MULTIBYTE    int            mblen;    int            encoding;
@@ -1307,31 +1318,34 @@    {        if (c == delim[0] || c == '\n' ||            (c == '\\' && !is_array))
-            CopySendChar('\\', fp);
+            XCopySendChar('\\', fp);        else if (c == '\\' && is_array)        {            if (*(string + 1) ==
'\\')           {                /* translate \\ to \\\\ */
 
-                CopySendChar('\\', fp);
-                CopySendChar('\\', fp);
-                CopySendChar('\\', fp);
+                XCopySendChar('\\', fp);
+                XCopySendChar('\\', fp);
+                XCopySendChar('\\', fp);                string++;            }            else if (*(string + 1) ==
'"')           {                /* translate \" to \\\" */
 
-                CopySendChar('\\', fp);
-                CopySendChar('\\', fp);
+                XCopySendChar('\\', fp);
+                XCopySendChar('\\', fp);            }        }#ifdef MULTIBYTE        for (i = 0; i < mblen; i++)
-            CopySendChar(*(string + i), fp);
+            XCopySendChar(*(string + i), fp);#else
-        CopySendChar(*string, fp);
+        XCopySendChar(*string, fp);#endif    }
+    
+    /* Now send the whole output string in one shot */
+    CopySendData (__buffer, __buffer_ofs, fp);}/*
diff -u -r NORMAL/postgresql-6.5/src/backend/utils/adt/int.c PROFILING/postgresql-6.5/src/backend/utils/adt/int.c
--- NORMAL/postgresql-6.5/src/backend/utils/adt/int.c    Sun Feb 14 09:49:20 1999
+++ PROFILING/postgresql-6.5/src/backend/utils/adt/int.c    Sat Aug  7 15:53:36 1999
@@ -210,9 +210,60 @@int4out(int32 l){    char       *result;
+    char       *sptr, *tptr;
+    int32 value = l;
+    char       temp[12]; /* For storing string in reverse */
+
+#warning Wayne put hacks here    result = (char *) palloc(12);        /* assumes sign, 10 digits, '\0' */
-    ltoa(l, result);
+
+
+    /* ltoa(l, result);
+       maps to sprintf(result, "%d", l) later on in the code */
+
+    /* Remove minus sign firstly */
+    if (l < 0)
+      value = -l;
+    
+    /* Point to our output string at the end */
+    tptr = temp;
+    
+    /* Now loop until we have no value left */
+    while (value > 0)
+      {
+        int current = value % 10;
+        value = value / 10;
+
+        *tptr = current + '0';
+        tptr++;
+      }
+    if (tptr == temp)
+      {
+        *tptr = '0';
+        tptr++;
+      }
+    *tptr = '\0';
+    tptr--;
+
+    /* Now, we need to prepare the result which is the reversal */
+    sptr = result;
+    if (l < 0)
+      {
+        *sptr = '-';
+        sptr++;
+      }
+    
+    while (tptr >= temp)
+      {
+        *sptr = *tptr;
+        sptr++;
+        tptr--;
+      }
+
+    /* Ok, we have copied everything - terminate now */
+    *sptr = '\0';
+        return result;}



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] plpgsql requires bison
Следующее
От: Wayne Piekarski
Дата:
Сообщение: Re: [HACKERS] SIGSEGV on CREATE FUNCTION with plpgsql