Re: Re: Which qsort is used
От | Dann Corbit |
---|---|
Тема | Re: Re: Which qsort is used |
Дата | |
Msg-id | D425483C2C5C9F49B5B7A41F8944154757D388@postal.corporate.connx.com обсуждение исходный текст |
Ответ на | Which qsort is used (Qingqing Zhou <zhouqq@cs.toronto.edu>) |
Ответы |
Re: Re: Which qsort is used
Re: Re: Which qsort is used |
Список | pgsql-hackers |
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, December 16, 2005 9:03 PM > To: Dann Corbit > Cc: Qingqing Zhou; Bruce Momjian; Luke Lonergan; Neil Conway; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] Re: Which qsort is used > > "Dann Corbit" <DCorbit@connx.com> writes: > > Both of them are modified versions of Bentley's sort algorithm. > > Jon Bentley of Bell Labs? Small world ... he was my thesis adviser > for awhile when he was at CMU. He's a good algorithms man, for sure. > > > I just added the in-order check to both of them, and the reversed > > partition check for the second method (in the case of the subfiles > > because insertion sort is allergic to reversed partitions). > > I've still got a problem with these checks; I think they are a net > waste of cycles on average. They would only be a win if you expected a > nontrivial percentage of perfectly-in-order or perfectly-reverse-order > inputs. What I think is much more probable in the Postgres environment > is almost-but-not-quite-ordered inputs --- eg, a table that was > perfectly ordered by key when filled, but some of the tuples have since > been moved by UPDATEs. This is the worst possible case for the in-order > checks, because they can then grovel over large percentages of the file > before failing ... and when they fail, those cycles are entirely wasted; > you have not advanced the state of the sort at all. > > For the "usual" case of randomly ordered input, of course it doesn't > matter much at all because the in-order checks will fail after examining > not too many items. But to argue that the checks are worth making, you > have to assume that perfectly-ordered inputs are more common than > almost-ordered inputs, and I think that is exactly the wrong assumption > for the Postgres environment. I sure haven't seen any evidence that > it's a good assumption. The benchmarks say that they (order checks) are a good idea on average for ordered data, random data, and partly ordered data. It does not require perfectly ordered data for the checks to be useful. On mostly ordered data, it is likely that some partitions are perfectly ordered. If you trace the algorithms in a debugger you will be surprised at how often the partitions are ordered, even with random sets as input.
В списке pgsql-hackers по дате отправления: