Обсуждение: sequential scan result order vs performance
Hi, while working on the executor, to process "batches" or "bubbles" of tuples I hit some weird performance issues (as in things didn't improve as much as I had hoped). A fair amount of headscratching later I figured out that the tuple order in sequential scans is a major bottleneck. When iterating over a page we return tuples in itemid order, which makes them returned in *descending* order address-wise, as tuples are stored starting from the end of the page. But when actually accessing the tuples, we access them increasing address order (header, and then column by column). It appears that that, quite understandable confuses prefetch units, leading to drastically increased cache miss ratios. It's quite easy to change iteration so we start with the latest item, and iterate till the first, rather than the other way round. In benchmarks with somewhat wide columns and aggregation, this yields speedups of over 30%, before hitting other bottlenecks. I do wonder however if it's acceptable to change the result order of sequential scans. People don't tend to specify ORDER BY everwhere - as evidenced by large swathes of our regression tests failing spuriously - so they might not be happy to see a somewhat weird order (pages sequentially increasing, but individual tuples inside a page in reverse order). We could change the order only in cases where the user doesn't actually see the result, say below aggregation, sort, and whatnot nodes. On the other hand the benefit is quite significant for heavily filtered sequential scans as well, COPY out also benefits. Comments? Greetings, Andres Freund
Hi, On 2016-10-30 00:36:55 -0700, Andres Freund wrote: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. One more point: Over IM Robert commented that it's not guaranteed that itemid order correlates precisely with reverse "tuple data" order. After PageRepairFragmentation() that'll not be the case anymore. That's true, but I suspect in many cases with larger analytics queries the correlation will still be significant, and we also could make it guaranteed with the price of making PageRepairFragmentation() a bit more expensive. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. > I do wonder however if it's acceptable to change the result order of > sequential scans. I think there will be a lot of howls. People expect that creating a table by inserting a bunch of rows, and then reading back those rows, will not change the order. We already futzed with that guarantee a bit with syncscans, but that only affects quite large tables --- and even there, we were forced to provide a way to turn it off. If you were talking about 3X then maybe it would be worth it, but for 30% (on a subset of queries) I am not excited. I wonder whether we could instead adjust the rules for insertion so that tuples tend to be physically in order by itemid. I'm imagining leaving two "holes" in a page and sometimes (hopefully not often) having to shift data during insert to preserve the ordering. regards, tom lane
On 10/30/16 9:12 AM, Tom Lane wrote: > I think there will be a lot of howls. People expect that creating > a table by inserting a bunch of rows, and then reading back those > rows, will not change the order. We already futzed with that guarantee > a bit with syncscans, but that only affects quite large tables --- and > even there, we were forced to provide a way to turn it off. Leaving a 30% performance improvement on the floor because some people don't grok how sets work seems insane to me. We could have a GUC to disable this. I suspect ORDER BY ctid would be another option. BTW, I've sometimes wished for a mode where queries would silently have result ordering intentionally futzed, to eliminate any possibility of dependence on tuple ordering (as well as having sequences start at some random value). I guess with the hooks that are in place today it wouldn't be hard to stick a ORDER BY random() in if there wasn't already a Sort node at the top level... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Jim Nasby wrote: > On 10/30/16 9:12 AM, Tom Lane wrote: >> I think there will be a lot of howls. People expect that creating >> a table by inserting a bunch of rows, and then reading back those >> rows, will not change the order. We already futzed with that guarantee >> a bit with syncscans, but that only affects quite large tables --- and >> even there, we were forced to provide a way to turn it off. > > Leaving a 30% performance improvement on the floor because some people > don't grok how sets work seems insane to me. +1 Yours, Laurenz Albe
Re: sequential scan result order vs performance
От
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > BTW, I've sometimes wished for a mode where queries would silently have > result ordering intentionally futzed, to eliminate any possibility of > dependence on tuple ordering (as well as having sequences start at some > random value). FWIW, SQLite has this, in the form of 'PRAGMA reverse_unordered_selects'. http://sqlite.org/pragma.html#pragma_reverse_unordered_selects -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live withthe consequences of." --Skud's Meta-Law
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby <spandir="ltr"><<a href="mailto:Jim.Nasby@bluetreble.com" target="_blank">Jim.Nasby@bluetreble.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">BTW, I'vesometimes wished for a mode where queries would silently have result ordering intentionally futzed, to eliminate anypossibility of dependence on tuple ordering (as well as having sequences start at some random value). I guess with thehooks that are in place today it wouldn't be hard to stick a ORDER BY random() in if there wasn't already a Sort nodeat the top level...</blockquote></div><div class="gmail_extra"><br /></div>+1<br />In Oracle, we sorta had that featureby adding a parallel hint to a query even if it didn't need it. It came in handy.</div></div>