Re: [HACKERS] A Better External Sort?
От | Jeffrey W. Baker |
---|---|
Тема | Re: [HACKERS] A Better External Sort? |
Дата | |
Msg-id | 1128015863.11474.9.camel@noodles обсуждение исходный текст |
Ответ на | Re: [HACKERS] A Better External Sort? ("Luke Lonergan" <llonergan@greenplum.com>) |
Ответы |
Re: [HACKERS] A Better External Sort?
Re: [HACKERS] A Better External Sort? Re: [HACKERS] A Better External Sort? |
Список | pgsql-performance |
On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote: > Josh, > > On 9/29/05 9:54 AM, "Josh Berkus" <josh@agliodbs.com> wrote: > > > Following an index creation, we see that 95% of the time required is the > > external sort, which averages 2mb/s. This is with seperate drives for > > the WAL, the pg_tmp, the table and the index. I've confirmed that > > increasing work_mem beyond a small minimum (around 128mb) had no benefit > > on the overall index creation speed. > > Yuuuup! That about sums it up - regardless of taking 1 or 2 passes through > the heap being sorted, 1.5 - 2 MB/s is the wrong number. Yeah this is really bad ... approximately the speed of GNU sort. Josh, do you happen to know how many passes are needed in the multiphase merge on your 60GB table? Looking through tuplesort.c, I have a couple of initial ideas. Are we allowed to fork here? That would open up the possibility of using the CPU and the I/O in parallel. I see that tuplesort.c also suffers from the kind of postgresql-wide disease of calling all the way up and down a big stack of software for each tuple individually. Perhaps it could be changed to work on vectors. I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over the tape. It could be done in a single pass heap merge with N*log(M) comparisons, and, more importantly, far less input and output. I would also recommend using an external processes to asynchronously feed the tuples into the heap during the merge. What's the timeframe for 8.2? -jwb
В списке pgsql-performance по дате отправления: