Re: Abbreviated keys for Numeric
От | Tomas Vondra |
---|---|
Тема | Re: Abbreviated keys for Numeric |
Дата | |
Msg-id | 54E929D6.2000309@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Abbreviated keys for Numeric (Peter Geoghegan <pg@heroku.com>) |
Список | pgsql-hackers |
On 21.2.2015 23:09, Peter Geoghegan wrote: > On Sat, Feb 21, 2015 at 12:15 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >>> Although I cannot easily explain the disparity in performance between >>> 1M and 5M sized sets for this query: >>> >>> select count(distinct randtxt) from stuff_text >>> >>> You did make sure that the queries didn't spill to disk, right? Or >>> that they did so consistently, at least. >> >> All the queries were running with work_mem=1GB, and I don't think they >> were spilling to disk. Actually, I don't have the 'merge' patch applied, >> so that would probably crash because of SIGSEGV. > > Actually, that isn't so -- Andrew's datum sort patch incidentally > fixes tuplesort_begin_datum() in the same way as the patch I posted. > You shouldn't actually need my patch, which was just an immediate fix. Oh, I see - it was failing only because I applied the numeric patch without the datum one. Anyway, I'm logging using log_temp_files=0 and I see no temp files in the log file, so it's all in-memory sorts. > I can recreate the problem you see with text sort regressions. > Abbreviation is aborted for the case in question, unsurprisingly, > and fairly far in. With that many tuples, the idea of taking > abbreviated cardinality as a proxy for full cardinality becomes less > important, because either way you have to do at least 10 comparisons > per item on average. Originally, I had as a heuristic that once you > get to a million items without aborting, stop considering the > possibility of aborting - that much should probably be added back, at > a minimum. Andrew is already doing that in his numeric patch (at > 100,000 tuples), and that may be the only reason why numeric is not > regressed too. OK. I'm not going to pretend I fully understand what's going on here. I haven't paid that close attention to the original strxfrm() patch, but if I got it right there's some heuristics that needs tuning. > For a query like this, if I "#define DEBUG_ABBREV_KEYS", so that we > don't actually abort, and run this query of yours until it stabilizes: > > select * from (select * from stuff_text order by randtxt offset > 100000000000) foo; > > It takes about 5.3 seconds. If I don't, however - if it is allowed > to run its course - it takes a whopping 35 seconds. So the ad-hoc > cost model of the text opclass definitely needs more work, as I > suspected. That tuning should probably happen before this patch gets in, right? I'm not very comfortable with first committing this, and then fixing the regression as that might miss the 9.5. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Petr JelinekДата:
Сообщение: Re: Add min and max execute statement time in pg_stat_statement