Inefficiencies in COPY command
От | Wayne Piekarski |
---|---|
Тема | Inefficiencies in COPY command |
Дата | |
Msg-id | 199908070759.RAA21484@helpdesk.senet.com.au обсуждение исходный текст |
Ответы |
Re: [HACKERS] Inefficiencies in COPY command
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Hi, With my system I get it to do a full pg_dump once per hour so that if anything goes wrong I can go back to the previous hours information if needed. One thing I did notice is that the whole process of the COPY command is highly CPU bound, the postgres will use up all available CPU and the disks are relatively idle. I ran some profiling on it with gprof, and I was shocked to find millions of calls to functions like memcpy, pq_putbytes, CopySendChar. I had a look at the code and most of these were just wrappers to various other functions, and did no work of their own. Having all these functions running a couple of million times and making heaps of their own calls meant that it was spending most of its time pushing and pulling things off the stack instead of doing real work :) So I looked into the problem, and CopyTo was getting the data, and calling CopyAttributeOut to convert it to a string and send it to the client. The CopyTo function was getting called rarely so this was a good place to start. Turns out that the CopyAttributeOut function gets the supplied string, and escapes it out, and as it does it, it calls CopySendChar for each character to put it in the buffer for sending. This function does a lot of other function calls, including a memcpy or two. So I made some changes to CopyAttributeOut so that it escapes the string initially into a temporary buffer (allocated onto the stack) and then feeds the whole string to the CopySendData which is a lot more efficient because it can blast the whole string in one go, saving about 1/3 to 1/4 the number of memcpy and so on. This modification caused a copy (with profiling and everything else) to go from 1m14sec to 45 sec, which I was very happy with considering how easy it was to fix :) I kept hacking at the code though, and found another slow point in the code which was int4out. It was a one line front end to ltoa, then this called sprintf with just "%d", and then numerous calls to other internal libc functions. This was very expensive and wasted lots of CPU time. So, I rewrote int4out (throwing away all the excess code) to do this conversion directly without calling any libc code which is more generic and slower, and this too gained a speed improvement. Combined with my previous patch, with profiling and all that turned off, and -O3, I managed to get another COPY (on a different table - sorry) down from 30 seconds to 15 seconds, so I've managed to double its speed without any tradeoffs whatsoever! I have included a patch below for the changes I made, although I would only use this to look at, it is shocking code which was a real hack job using #defines and a few other tricks to make it work so I could test this idea out quickly. The int4out example is especially bad, and has lots of pointers and stuff flying around and may have some bugs. But, what I think could be interesting to improve our performance is to make simple mods to CopyAttributeOut (along the lines of my changes) - and to also make changes to remove all use of sprintf when numbers and floats are being converted into strings. I would imagine this would generally speed up SELECT and COPY commands, so the performance increase can be gained in other places too. If someone wants, I can do cleaner versions of this patch for inclusion in the code, or one of the developers may want to do this instead in case theres something I'm missing. But I wanted to share this with everyone to help increase Postgres' speed as it is quite significant! [Patch included below for reference - note that this patch probably has bugs and use it at your own risk! I am not using this patch except for testing] Regards, Wayne ------------------------------------------------------------------------------ Wayne Piekarski Tel: (08) 8221 5221 Research & Development Manager Fax: (08) 8221 5220 SE Network Access Pty Ltd Mob: 0407 395 889 222 Grote Street Email: wayne@senet.com.au Adelaide SA 5000 WWW: http://www.senet.com.au diff -u -r NORMAL/postgresql-6.5/src/backend/commands/copy.c PROFILING/postgresql-6.5/src/backend/commands/copy.c --- NORMAL/postgresql-6.5/src/backend/commands/copy.c Mon Jun 14 10:33:40 1999 +++ PROFILING/postgresql-6.5/src/backend/commands/copy.c Sat Aug 7 16:02:50 1999 @@ -1278,12 +1278,23 @@#endif} + + +#warning Wayne put hacks here + +#define XCopySendChar(ch, fp) __buffer[__buffer_ofs++] = ch + + +static voidCopyAttributeOut(FILE *fp, char *server_string, char *delim, int is_array){ char *string; char c; + char __buffer [(strlen (server_string) * 2) + 4]; /* Use +4 for safety */ + int __buffer_ofs = 0; +#ifdef MULTIBYTE int mblen; int encoding; @@ -1307,31 +1318,34 @@ { if (c == delim[0] || c == '\n' || (c == '\\' && !is_array)) - CopySendChar('\\', fp); + XCopySendChar('\\', fp); else if (c == '\\' && is_array) { if (*(string + 1) == '\\') { /* translate \\ to \\\\ */ - CopySendChar('\\', fp); - CopySendChar('\\', fp); - CopySendChar('\\', fp); + XCopySendChar('\\', fp); + XCopySendChar('\\', fp); + XCopySendChar('\\', fp); string++; } else if (*(string + 1) == '"') { /* translate \" to \\\" */ - CopySendChar('\\', fp); - CopySendChar('\\', fp); + XCopySendChar('\\', fp); + XCopySendChar('\\', fp); } }#ifdef MULTIBYTE for (i = 0; i < mblen; i++) - CopySendChar(*(string + i), fp); + XCopySendChar(*(string + i), fp);#else - CopySendChar(*string, fp); + XCopySendChar(*string, fp);#endif } + + /* Now send the whole output string in one shot */ + CopySendData (__buffer, __buffer_ofs, fp);}/* diff -u -r NORMAL/postgresql-6.5/src/backend/utils/adt/int.c PROFILING/postgresql-6.5/src/backend/utils/adt/int.c --- NORMAL/postgresql-6.5/src/backend/utils/adt/int.c Sun Feb 14 09:49:20 1999 +++ PROFILING/postgresql-6.5/src/backend/utils/adt/int.c Sat Aug 7 15:53:36 1999 @@ -210,9 +210,60 @@int4out(int32 l){ char *result; + char *sptr, *tptr; + int32 value = l; + char temp[12]; /* For storing string in reverse */ + +#warning Wayne put hacks here result = (char *) palloc(12); /* assumes sign, 10 digits, '\0' */ - ltoa(l, result); + + + /* ltoa(l, result); + maps to sprintf(result, "%d", l) later on in the code */ + + /* Remove minus sign firstly */ + if (l < 0) + value = -l; + + /* Point to our output string at the end */ + tptr = temp; + + /* Now loop until we have no value left */ + while (value > 0) + { + int current = value % 10; + value = value / 10; + + *tptr = current + '0'; + tptr++; + } + if (tptr == temp) + { + *tptr = '0'; + tptr++; + } + *tptr = '\0'; + tptr--; + + /* Now, we need to prepare the result which is the reversal */ + sptr = result; + if (l < 0) + { + *sptr = '-'; + sptr++; + } + + while (tptr >= temp) + { + *sptr = *tptr; + sptr++; + tptr--; + } + + /* Ok, we have copied everything - terminate now */ + *sptr = '\0'; + return result;}
В списке pgsql-hackers по дате отправления:
Следующее
От: Wayne PiekarskiДата:
Сообщение: Re: [HACKERS] SIGSEGV on CREATE FUNCTION with plpgsql