Обсуждение: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
Hi, We came up with this patch in response to a problem reported to us by a client. They had a query which took an unacceptably long time to respond to a cancel request (SIGINT). The client uses 8.1.4, so the patch is against that. Their work_mem setting was rather large (1000000). We determined that when it received SIGINT, the backend was always inside qsort(), so it wouldn't call ProcessInterrupts() again until it finished this large in-memory sort. Upon entering tuplesort_performsort(), state->memtupcount was 29247. The patch puts a CHECK_FOR_INTERRUPTS in qsort_comparetup. This solves their problem, at the cost of checking InterruptPending a lot. The "unlikely()" gcc directive might help there a bit. I'm not sure if this patch has general applicability - but it seems to solve the problem for our client. Does anyone think it might introduce any new problems? Thanks, -- Charles Duffy
Вложения
""Charles Duffy"" <charles.duffy@gmail.com> wrote > > We came up with this patch in response to a problem reported to us by a > client. They had a query which took an unacceptably long time to respond > to a cancel request (SIGINT). The client uses 8.1.4, so the patch is > against that. > How long is that "unacceptably long time"? > Their work_mem setting was rather large (1000000). We determined that when it > received SIGINT, the backend was always inside qsort(), so it wouldn't > call ProcessInterrupts() again until it finished this large in-memory > sort. Upon entering tuplesort_performsort(), state->memtupcount was > 29247. > I agree that we may need to consider to let qsort() check interrupts, but the problem here is that 29247 doesn't look like a big number so I can't see why your patch solved the problem, unless the qsort_comparetup() function of the data type eats too many circles or the cpu is too slow. I just did a test to invoke a qsort on an "integer" field of a table with 5 million rows, and sent a SIGINT, the delay is 7 or 8 seconds. I suspect there are some other places doesn't check interrupts -- what's your query plan? Regards, Qingqing
Qingqing, On 7/12/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote: > > > How long is that "unacceptably long time"? > 30 seconds. > the problem here is that 29247 doesn't look like a big number so I can't see > why your patch solved the problem, unless the qsort_comparetup() function of > the data type eats too many circles or the cpu is too slow. Well...when exhibiting the problem, execution stays inside qsort() for the entire 'delay period' - I don't think its off in some other recess which is also lacking in interrupt flag checking. As for the CPU - the machine is a 4 way Opteron, and otherwise performs well. It does seem odd that the in-memory sort takes as long as it does - the plan may suggest a reason. And - the patched version doesn't exhibit the problem. > I just did a > test to invoke a qsort on an "integer" field of a table with 5 million rows, > and sent a SIGINT, the delay is 7 or 8 seconds. I suspect there are some > other places doesn't check interrupts -- what's your query plan? Plan attached. Thanks, Charles Duffy
Вложения
"Charles Duffy" <charles.duffy@gmail.com> writes: > On 7/12/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote: >> the problem here is that 29247 doesn't look like a big number so I can't see >> why your patch solved the problem, unless the qsort_comparetup() function of >> the data type eats too many circles or the cpu is too slow. > Well...when exhibiting the problem, execution stays inside qsort() for > the entire 'delay period' - I don't think its off in some other recess > which is also lacking in interrupt flag checking. > As for the CPU - the machine is a 4 way Opteron, and otherwise > performs well. It does seem odd that the in-memory sort takes as long > as it does - the plan may suggest a reason. Well, there's something awfully fishy here. Compare the two lower-level sorts in your EXPLAIN output: -> Sort (cost=2909.44..2944.94 rows=14201 width=320) (actual time=78196.698..78239.799 rows=29247 loops=1) Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6,vl7, vl8, vl9 -> Append (cost=0.00..1930.02 rows=14201 width=320) (actual time=0.052..396.577 rows=29247 loops=1) -> Sort (cost=403.42..403.59 rows=71 width=320) (actual time=318.727..339.305 rows=10932 loops=1) Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5, vl6,vl7, vl8, vl9 -> Append (cost=78.88..401.23 rows=71 width=320) (actual time=5.197..192.868 rows=10932 loops=1) The first one took 77843.222 msec to sort 29247 rows, the second took 146.437 msec to sort 10932 rows with what appears to be the same sort key. One of these things is not like the other ... What are the data types of the sort columns? Is there anything much different in the statistics of the two subqueries --- for example, maybe one produces a lot of output that only differs in the rightmost sort columns, where the other has keys that always differ in a leading column? I forget if you are running 8.1 or not, but if you are, turning on trace_sort might produce useful information. regards, tom lane
"Charles Duffy" <charles.duffy@gmail.com> writes: > [ CHECK_FOR_INTERRUPTS inside qsort comparison routine ] It occurs to me that there's a nonzero risk of problems here, because if the interrupt occurs qsort() will lose control. I'm wondering whether there are any implementations of qsort() that allocate memory and then release it before returning. If so, interrupting the sort would lead to memory leaked permanently (till backend exit anyway). We might have to just tolerate this, but if it occurs on a lot of platforms I'd have second thoughts about applying the patch. Anyone familiar with the internals of glibc's qsort, in particular? regards, tom lane
Tom Lane wrote: > We might have to just tolerate this, but if it occurs on a lot of > platforms I'd have second thoughts about applying the patch. Anyone > familiar with the internals of glibc's qsort, in particular? Doesn't look like it's allocating any nonlocal memory: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12&content-type=text/x-cvsweb-markup&cvsroot=glibc -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> We might have to just tolerate this, but if it occurs on a lot of >> platforms I'd have second thoughts about applying the patch. Anyone >> familiar with the internals of glibc's qsort, in particular? > Doesn't look like it's allocating any nonlocal memory: > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12&content-type=text/x-cvsweb-markup&cvsroot=glibc But this file defines _quicksort() not qsort(). I was under the impression that the latter is actually a mergesort in glibc ... regards, tom lane
Tom Lane wrote: > > Doesn't look like it's allocating any nonlocal memory: > > > > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1. > >12&content-type=text/x-cvsweb-markup&cvsroot=glibc > > But this file defines _quicksort() not qsort(). I was under the > impression that the latter is actually a mergesort in glibc ... The merge sort is here: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc It uses alloca, so we're good here. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > The merge sort is here: > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc > It uses alloca, so we're good here. Uh ... but it also uses malloc, and potentially a honkin' big malloc at that (up to a quarter of physical RAM). So I'm worried again. Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? regards, tom lane
On 7/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, Qingqing's question still needs to be answered: how can a sort > of under 30k items take so long? > It happens because (as previously suggested by Tom) the dataset for the 'short' (~10k rows, .3 sec) sort has no rows whose leftmost fields evaluate to 'equal' when passed to the qsort compare function. The 'long' sort, (~30k rows, 78 sec) has plenty of rows whose first 6 columns all evaluate as 'equal' when the rows are compared. For the 'long' data, the compare moves on rightward until it encounters 'flato', which is a TEXT column with an average length of 7.5k characters (with some rows up to 400k). The first 6 columns are mostly INTEGER, so compares on them are relatively inexpensive. All the expensive compares on 'flato' account for the disproportionate difference in sort times, relative to the number of rows in each set. As for the potential for memory leaks - thinking about it. Thanks, Charles Duffy. > Peter Eisentraut <peter_e@gmx.net> writes: > > The merge sort is here: > > > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc > > > It uses alloca, so we're good here. > > Uh ... but it also uses malloc, and potentially a honkin' big malloc at > that (up to a quarter of physical RAM). So I'm worried again. > > Anyway, Qingqing's question still needs to be answered: how can a sort > of under 30k items take so long? > > regards, tom lane >
Вложения
"Charles Duffy" <charles.duffy@gmail.com> writes: > ... For the 'long' data, the compare moves on rightward until it > encounters 'flato', which is a TEXT column with an average length of > 7.5k characters (with some rows up to 400k). The first 6 columns are > mostly INTEGER, so compares on them are relatively inexpensive. All > the expensive compares on 'flato' account for the disproportionate > difference in sort times, relative to the number of rows in each set. Yeah, and it's not just that it's text either. At those sizes, all the values will be toasted, which means each compare is paying the price of fetching multiple rows from the toast table. And decompressing them too, no doubt. These costs are most likely swamping the actual strcoll() (not that that's not bad enough compared to int4cmp). We could probably tweak the sorting code to forcibly detoast sort keys before beginning the sort, but I'm not entirely convinced that would be a win: reading and writing enormous sort keys won't be cheap either. Meanwhile, for a cheap solution: do you really need to sort on flato at all? Maybe sorting on substr(flato,1,100) would be good enough? regards, tom lane
Are we done with the sort interrupt issue mentioned in the subject line, and the issue outlined below? --------------------------------------------------------------------------- Tom Lane wrote: > "Charles Duffy" <charles.duffy@gmail.com> writes: > > ... For the 'long' data, the compare moves on rightward until it > > encounters 'flato', which is a TEXT column with an average length of > > 7.5k characters (with some rows up to 400k). The first 6 columns are > > mostly INTEGER, so compares on them are relatively inexpensive. All > > the expensive compares on 'flato' account for the disproportionate > > difference in sort times, relative to the number of rows in each set. > > Yeah, and it's not just that it's text either. At those sizes, all > the values will be toasted, which means each compare is paying the > price of fetching multiple rows from the toast table. And decompressing > them too, no doubt. These costs are most likely swamping the actual > strcoll() (not that that's not bad enough compared to int4cmp). > > We could probably tweak the sorting code to forcibly detoast sort keys > before beginning the sort, but I'm not entirely convinced that would be > a win: reading and writing enormous sort keys won't be cheap either. > > Meanwhile, for a cheap solution: do you really need to sort on flato > at all? Maybe sorting on substr(flato,1,100) would be good enough? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Are we done with the sort interrupt issue mentioned in the subject line, > and the issue outlined below? I'm inclined not to apply the proposed patch (adding CHECK_FOR_INTERRUPTS) because of the risk of memory leakage inside qsort. OTOH you could argue that there's an unfixable risk of memory leakage there anyway, because it's always possible that the invoked datatype comparison routine exits with elog(ERROR) for some reason, or even contains a CHECK_FOR_INTERRUPTS call itself. Comments? As for the question of whether we should try to detoast sort keys before sorting, I'd suggest adding that to TODO. Investigating whether this would be a good idea will take more time than we have for 8.2, so it's gonna have to wait for a future cycle. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Are we done with the sort interrupt issue mentioned in the subject line, > > and the issue outlined below? > > I'm inclined not to apply the proposed patch (adding > CHECK_FOR_INTERRUPTS) because of the risk of memory leakage inside > qsort. OTOH you could argue that there's an unfixable risk of memory > leakage there anyway, because it's always possible that the invoked > datatype comparison routine exits with elog(ERROR) for some reason, > or even contains a CHECK_FOR_INTERRUPTS call itself. Comments? OK, we do check somewhere during sorting, I assume. I can't imagine a single qsort() call taking all that long because we do them in batches anyway. > As for the question of whether we should try to detoast sort keys before > sorting, I'd suggest adding that to TODO. Investigating whether this > would be a good idea will take more time than we have for 8.2, so it's > gonna have to wait for a future cycle. Added to TODO: * Consider detoasting keys before sorting -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Charles Duffy" <charles.duffy@gmail.com> writes: > The patch puts a CHECK_FOR_INTERRUPTS in qsort_comparetup. Just to close out this thread: this is now done for 8.2, per discussion here: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00144.php regards, tom lane