Re: min()/max() with BRIN indexes

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: min()/max() with BRIN indexes
Дата
Msg-id 20200325174510.GA11047@momjian.us
обсуждение исходный текст
Ответ на Re: min()/max() with BRIN indexes  (Wayne <lists-pgsql@useunix.net>)
Список pgsql-sql
On Sat, Feb 29, 2020 at 09:40:58PM +0000, Wayne wrote:
> On Sat, Feb 29, 2020 at 02:37:15PM -0500, Tom Lane wrote:
> > Wayne <lists-pgsql@useunix.net> writes:
> > > I have rather large tables that use a time stamp as an index. New entries
> > > are continuously added to the table with the current time. If I convert
> > > from BTREE to BRIN indexes and select records with specific date ranges
> > > the BRIN is used and performance is acceptable. However I often want to
> > > get the latest time stamp using the max() function. I didn't expect that
> > > this would result in a sequential scan of the table and skip the BRIN
> > > index.
> > 
> > > Is this expected behavior?
> > 
> > Yeah.  In principle a BRIN index could be used to accelerate finding min
> > or max, but there's no actual support for that at the moment ... and in
> > any case, it'd still be substantially slower than the equivalent with
> > a btree index, which can locate the extremal values immediately.
> > 
> > For this particular case, you might be able to fake it with something like
> > 
> >     select max(ts) from mytab where ts > 'some cutoff'
> > 
> > if you can estimate some not-too-far-before-current-time cutoff
> > that you are sure you'll find some records after.
> 
> I kind of "discovered" the 'some cutoff' trick prior to my posting but
> neglected to mention it as I couldn't figure out why it worked but
> max(ts) by itself wouldn't.
> 
> Agreed, it would be substantially slower than a btree index but much
> faster than a seq scan of the table. In this use case they are monthly
> tables typically >= 130gig. The btree index is typically >20 gig while
> the corresponding brin is ~ 2meg. For all other use cases on these
> tables the brin index is a great space vs performance compromise.
> 
> For now I can get by with the 'some cutoff' estimate but I hope adding
> min()/max() to brin indexes on the wish list.

I am coming late to this thread, but wanted to comment.  BRIN min/max
values are worst-case, meaning they might be beyond the actual min/max
values.  To use BRIN indexes for max, I think you would need to get the
BRIN range with the highest max, then check the page to find its real
maximum.  You would then need to test all BRIN ranges that have a max
that is greater than the real max to make sure those don't have higher
values.  If that first BRIN max you find is much higher than the real
max, you might need to scan many BRIN ranges, and potentially all of
them.

We have a similar case when doing max on a partitioned table.  Right now
I think we just get a max for each partition and compute the max of
those results, but, for a range-partitioned table, we could do a max in
just the highest range partition, but if that doesn't return a row, we
would need to check the next lower partitions.

If you use ORDER BY and LIMIT, you could do a similar thing, but it
would be even more complex.  We have just never optimized for these
cases. but someday we might.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



В списке pgsql-sql по дате отправления:

Предыдущее
От: Sándor Daku
Дата:
Сообщение: Re: What is the right syntax for retrieving the last_insert_id() inPostgresql ?
Следующее
От: "Heinemann, Manfred (IMS)"
Дата:
Сообщение: FK constraint question