Обсуждение: BRIN indexes for MAX, MIN, ORDER BY?
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You just find the page range with the largest/smallest value, and then only scan that one. Would that be hard to implement? I'm interested in working on it if someone can give me some pointers.
Somewhat harder but still possible would be using BRIN indexes to accelerate ORDER BY. This would require a sorting algorithm that can take advantage of mostly-sorted inputs. You would sort the page ranges by their minimum or maximum value, then feed the sorting algorithm in that order.
Gavin Wahl wrote: > It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > just find the page range with the largest/smallest value, and then only > scan that one. Would that be hard to implement? I'm interested in working > on it if someone can give me some pointers. I think this means you need to represent this operation as a specific Path in some way. See build_minmax_path() and its callers in planagg; you probably need to tweak preprocess_minmax_aggregates() to consider this. This doesn't look like a simple project to me, mind. > Somewhat harder but still possible would be using BRIN indexes to > accelerate ORDER BY. This would require a sorting algorithm that can take > advantage of mostly-sorted inputs. You would sort the page ranges by their > minimum or maximum value, then feed the sorting algorithm in that order. I wouldn't know where to start for this. Maybe once Tom is done with planner rejiggering it would make sense to consider looking at how to do it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl <gavinwahl@gmail.com> wrote: > It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > just find the page range with the largest/smallest value, and then only scan > that one. You might need to scan more than that if you don't find any rows that are visible. > Would that be hard to implement? I'm interested in working on it > if someone can give me some pointers. > > Somewhat harder but still possible would be using BRIN indexes to accelerate > ORDER BY. This would require a sorting algorithm that can take advantage of > mostly-sorted inputs. You would sort the page ranges by their minimum or > maximum value, then feed the sorting algorithm in that order. Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a Sort node (quicksort or external sort). So the sort is already receiving data sorted in BRIN-block order, isn't it? Are you saying that the sort node should switch to something like insertion sort in this case? http://www.sorting-algorithms.com/nearly-sorted-initial-order -- Thomas Munro http://www.enterprisedb.com
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Gavin Wahl wrote: >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You >> just find the page range with the largest/smallest value, and then only >> scan that one. Would that be hard to implement? I'm interested in working >> on it if someone can give me some pointers. > I think this means you need to represent this operation as a specific > Path in some way. See build_minmax_path() and its callers in planagg; > you probably need to tweak preprocess_minmax_aggregates() to consider > this. > This doesn't look like a simple project to me, mind. >> Somewhat harder but still possible would be using BRIN indexes to >> accelerate ORDER BY. This would require a sorting algorithm that can take >> advantage of mostly-sorted inputs. You would sort the page ranges by their >> minimum or maximum value, then feed the sorting algorithm in that order. > I wouldn't know where to start for this. Maybe once Tom is done with > planner rejiggering it would make sense to consider looking at how to do > it. Yeah. I would urgently recommend that people *not* try to build new things like planagg.c right now. A large part of the point of upper planner path-ification is to have a less grotty way of dealing with things like specialized aggregate implementations. (And yes, I am working on that. Honest.) regards, tom lane
> Yeah. I would urgently recommend that people *not* try to build new
> things like planagg.c right now. A large part of the point of upper
> planner path-ification is to have a less grotty way of dealing with
> things like specialized aggregate implementations.
Ok. I will wait and ask again later.
Hi Gavin Note that Alexander Korotkov already started work in 2013 on a somewhat similar feature called partial sort: http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com In particular, see the 2nd patch for KNN sort -- it uses known bounding boxes from the GiST index for sorting, which in many ways is similar to min/max BRIN ranges. > *partial-knn-1.patch* > > KNN-GiST provides ability to get ordered results from index, but this order > is based only on index information. For instance, GiST index contains > bounding rectangles for polygons, and we can't get exact distance to > polygon from index (similar situation is in PostGIS). In attached patch, > GiST distance method can set recheck flag (similar to consistent method). > This flag means that distance method returned lower bound of distance and > we should recheck it from heap. Unfortunatley this work has stalled. Regards, Marti On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl <gavinwahl@gmail.com> wrote: > It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > just find the page range with the largest/smallest value, and then only scan > that one. Would that be hard to implement? I'm interested in working on it > if someone can give me some pointers. > > Somewhat harder but still possible would be using BRIN indexes to accelerate > ORDER BY. This would require a sorting algorithm that can take advantage of > mostly-sorted inputs. You would sort the page ranges by their minimum or > maximum value, then feed the sorting algorithm in that order.
On 09/28/2015 05:28 PM, Marti Raudsepp wrote: > Note that Alexander Korotkov already started work in 2013 on a > somewhat similar feature called partial sort: > http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com > > In particular, see the 2nd patch for KNN sort -- it uses known > bounding boxes from the GiST index for sorting, which in many ways is > similar to min/max BRIN ranges. > >> *partial-knn-1.patch* >> >> KNN-GiST provides ability to get ordered results from index, but this order >> is based only on index information. For instance, GiST index contains >> bounding rectangles for polygons, and we can't get exact distance to >> polygon from index (similar situation is in PostGIS). In attached patch, >> GiST distance method can set recheck flag (similar to consistent method). >> This flag means that distance method returned lower bound of distance and >> we should recheck it from heap. > > Unfortunatley this work has stalled. No, that was actually committed for 9.5. It's this item in the release notes: > Allow queries to perform accurate distance filtering of > bounding-box-indexed objects (polygons, circles) using GiST indexes > (Alexander Korotkov, Heikki Linnakangas) > > Previously, a common table expression was required to return a large > number of rows ordered by bounding-box distance, and then filtered > further with a more accurate non-bounding-box distance calculation. - Heikki
On 27/09/15 21:58, Gavin Wahl wrote: > Somewhat harder but still possible would be using BRIN indexes to > accelerate ORDER BY. This would require a sorting algorithm that can take > advantage of mostly-sorted inputs. You would sort the page ranges by their > minimum or maximum value, then feed the sorting algorithm in that order. An internal merge sort does well with partially-sorted input. -- Cheers,Jeremy
On 29 September 2015 at 13:20, Jeremy Harris <jgh@wizmail.org> wrote:
--
On 27/09/15 21:58, Gavin Wahl wrote:
> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.
An internal merge sort does well with partially-sorted input.
Yes, the $SUBJECT would be a valid use for BRIN.
And also in general for sorts, leading to an optimization of merge joins using BRIN indexes.
All this was foreseen in the original design; if it really was trivial it would be in 9.5 already...
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services