Re: Inlining comparators as a performance optimisation
От | Peter Geoghegan |
---|---|
Тема | Re: Inlining comparators as a performance optimisation |
Дата | |
Msg-id | CAEYLb_V=qkEEm+szFMMV6T69C7NPw_RF_Jq9EjiT=esjcghOzA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Inlining comparators as a performance optimisation (Peter Geoghegan <peter@2ndquadrant.com>) |
Ответы |
Re: Inlining comparators as a performance optimisation
(Peter Geoghegan <peter@2ndquadrant.com>)
|
Список | pgsql-hackers |
3 items are attached: 1. A spreadsheet, results.ods, which has results for automated multiple runs of the "doubled up" dellstore orderlines queries (where orderlines is 385 MB). Results are sorted, with the median (or the lower middle value, didn't get the mean of the two middle runs) result for each run highlighted as representative. There are 3 builds of Postgres (HEAD, not inlined, optimized), each on its own sheet in the spreadsheet. The cache was warmed for each query, and subsequently the query was run 20 times. 2. A python script I hacked together that should run on anything around Python 2.5+, with a single dependency, psycopg2. Look at --help to see how it works. This is the script that actually generated the figures that went into the spreadsheet, by being run once for each type of build. You can fairly easily play along at home with this, for these and other queries. It will spit out CSV files. It is designed to warm the cache (by default 3 times before each query) to eliminate caching effects. You can add your own query to the python list to have it run by the script, to generate the same set of figures for that query. I'm rather curious as to how much of an impact this optimisation will have on queries with unique nodes, joins and grouping when they rely on a sort node for their input, in the real world. Certainly, a query need not have an ORDER BY clause to see benefits, perhaps substantial benefits. The logic to parse sort node details is rather unsophisticated right now though, due to my focus on the obvious ORDER BY case. 3. The revision of the patch that was actually tested, now with inlining specialisations for the single scanKey case, and a non-inlined specialisation for multiple scanKeys where we can still benefit from eliding the SQL function call machinery for the first scanKey, which is often almost as useful as being able to do so for all scanKeys. It also selects a sorting specialisation when it can in tuplesort_begin_heap, per Robert's suggestion. Reviewers will want to comment out line 731 of tuplesort.c, "#define optimize", to quickly get unoptimized behaviour for comparative purposes. Furthermore, if you'd like to see what this would look like without inlining, you can simply comment out assignments of inline variants of specialisations (i.e. int4inlqsort_arg and int8inlqsort_arg) in tuplesort_begin_heap. It has been suggested that I'm chasing diminishing returns by inlining, as I go further for a smaller benefit. Of course, that's true. However, I believe that I'm not chasing them past the point where that ceases to make sense, and these figures support that contention - chasing diminishing returns in the nature of this kind of work. Here, the additional benefit of inlining accounts for over an entire second shaved off a query that was originally 7056ms, so that's not to be sniffed at. I'll reiterate that qsort_arg has only been modified 4 times after its initial commit in 2006, and these were all trivial changes. While the way that I ape generic programming with the preprocessor is on the ugly side, what I've done can be much more easily understood with reference to qsort_arg itself. Robert said that duplicating the sort function was "iffy". However, that already happened long ago, as we're already maintaining both qsort_arg.c and qsort.c, and comments already urge maintainers to keep the two consistent. That's not been a problem though, because, as I've said, there has never been any call to make substantive changes to either in all those years. We could probably further reduce the code footprint of all of this by having template_qsort_arg.h generate comparators directly. That might be inflexible in a way that turns out to matter though, like if we wanted to use this for non-HAVE_INT64_TIMESTAMP timestamps and had to add NaN crud. My next step is to see how this goes with hundreds of sorts in the hundreds of megabytes on a high-end server. I don't have immediate access to one, but I'm working that out. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Вложения
В списке pgsql-hackers по дате отправления: