Обсуждение: rows estimate in explain analyze for the BRIN index

Поиск
Список
Период
Сортировка

rows estimate in explain analyze for the BRIN index

От
Oleksii Kliukin
Дата:
<div class="">Hi,</div><div class=""><br class="" /></div><div class="">While experimenting with BRIN on PostgreSQL
9.5RC1I came across the following plan (which is, btw a very good example of how BRIN rocks for the clustered data, the
sizeof this table is around 90GB, the size of the index is around 3MB):</div><div class=""><br class="" /></div><div
class="">explain(analyze, buffers, verbose) select 1 from example where event_time = now() - interval '5
months';</div><divclass="">                                                                      QUERY PLAN</div><div
class="">-------------------------------------------------------------------------------------------------------------------------------------------------------</div><div
class=""> BitmapHeap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual time=73.895..73.895 rows=0
loops=1)</div><divclass="">   Output: 1</div><div class="">   Recheck Cond: (example.event_time = (now() - '5
mons'::interval))</div><divclass="">   Rows Removed by Index Recheck: 4030</div><div class="">   Heap Blocks:
lossy=128</div><divclass="">   Buffers: shared hit=629</div><div class="">   ->  Bitmap Index Scan on
example_event_time_idx1 (cost=0.00..744.41 rows=6 width=0) (actual time=70.335..70.335 rows=1280 loops=1)</div><div
class="">        Index Cond: (example.event_time = (now() - '5 mons'::interval))</div><div class="">         Buffers:
sharedhit=501</div><div class=""> Planning time: 0.125 ms</div><div class=""> Execution time: 73.943 ms</div><div
class="">(11rows)</div><div class=""><br class="" /></div><div class="">Time: 74.642 ms</div><div class=""><br class=""
/></div><divclass=""><br class="" /></div><div class="">Here EXPLAIN ANALYZE reports 1280 rows from the Bitmap Index
Scan,but on the higher level, 4030 rows were removed by Index Recheck. </div><div class=""><br class="" /></div><div
class="">Thequestions are:</div><div class=""><br class="" /></div><div class="">- how does it get 1280 rows from the
BRINindex scan, given that BRIN only stores pointers to the heap blocks, not individual rows. Does it calculate the
numberof rows in the blocks returned?</div><div class=""><br class="" /></div><div class="">- how comes that the
subsequentrecheck filters out 4030 rows, out of supposedly 1280 returned?</div><div class=""><br class="" /></div><div
class="">Kindregards,</div><div class=""><div class="" style="color: rgb(0, 0, 0); letter-spacing: normal; orphans:
auto;text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width:0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;">--</div><divclass="" style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align:
start;text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width:0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;">Oleksii</div></div><brclass="" /> 

Re: rows estimate in explain analyze for the BRIN index

От
Tom Lane
Дата:
Oleksii Kliukin <alexk@hintbits.com> writes:
>  Bitmap Heap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual time=73.895..73.895 rows=0 loops=1)
>    Output: 1
>    Recheck Cond: (example.event_time = (now() - '5 mons'::interval))
>    Rows Removed by Index Recheck: 4030
>    Heap Blocks: lossy=128
>    Buffers: shared hit=629
>    ->  Bitmap Index Scan on example_event_time_idx1  (cost=0.00..744.41 rows=6 width=0) (actual time=70.335..70.335
rows=1280loops=1)
 
>          Index Cond: (example.event_time = (now() - '5 mons'::interval))
>          Buffers: shared hit=501

> - how does it get 1280 rows from the BRIN index scan, given that BRIN only stores pointers to the heap blocks, not
individualrows. Does it calculate the number of rows in the blocks returned?
 

It evidently returned 128 block IDs to the heapscan logic.  I have not
looked at the code, but a reasonable bet is that it's just guessing that
there are 10 rows per block.

That seems like an awfully low number, though; it equates to assuming
that rows are 800 bytes wide on average.  If we're going to use a fixed
number, 100 rows per block would probably be more nearly the correct
order of magnitude.

Another idea would be to use the heap's row density as calculated
by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
if relpages=0.  This'd only be convenient if the bitmap scan node has
the parent heap rel open, which it might not.
        regards, tom lane



Re: rows estimate in explain analyze for the BRIN index

От
Oleksii Kliukin
Дата:

On 30 Dec 2015, at 17:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleksii Kliukin <alexk@hintbits.com> writes:
Bitmap Heap Scan on example  (cost=744.44..757.64 rows=6 width=0) (actual time=73.895..73.895 rows=0 loops=1)
  Output: 1
  Recheck Cond: (example.event_time = (now() - '5 mons'::interval))
  Rows Removed by Index Recheck: 4030
  Heap Blocks: lossy=128
  Buffers: shared hit=629
  ->  Bitmap Index Scan on example_event_time_idx1  (cost=0.00..744.41 rows=6 width=0) (actual time=70.335..70.335 rows=1280 loops=1)
        Index Cond: (example.event_time = (now() - '5 mons'::interval))
        Buffers: shared hit=501

- how does it get 1280 rows from the BRIN index scan, given that BRIN only stores pointers to the heap blocks, not individual rows. Does it calculate the number of rows in the blocks returned?

It evidently returned 128 block IDs to the heapscan logic.  I have not
looked at the code, but a reasonable bet is that it's just guessing that
there are 10 rows per block.

You are right, this is at the end of bringetbitmap in brin.c
       /*
* XXX We have an approximation of the number of *pages* that our scan
* returns, but we don't have a precise idea of the number of heap tuples
* involved.
*/
PG_RETURN_INT64(totalpages * 10);



That seems like an awfully low number, though; it equates to assuming
that rows are 800 bytes wide on average.  If we're going to use a fixed
number, 100 rows per block would probably be more nearly the correct
order of magnitude.

Another idea would be to use the heap's row density as calculated
by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
if relpages=0.  This'd only be convenient if the bitmap scan node has
the parent heap rel open, which it might not.

+1

Kind regards,
--
Oleksii

Re: rows estimate in explain analyze for the BRIN index

От
Tom Lane
Дата:
Oleksii Kliukin <alexk@hintbits.com> writes:
>> On 30 Dec 2015, at 17:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another idea would be to use the heap's row density as calculated
>> by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
>> if relpages=0.  This'd only be convenient if the bitmap scan node has
>> the parent heap rel open, which it might not.

> +1

Any objections to the attached?

            regards, tom lane

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2622a7e..64bf788 100644
*** a/src/backend/access/brin/brin.c
--- b/src/backend/access/brin/brin.c
*************** bringetbitmap(PG_FUNCTION_ARGS)
*** 279,286 ****
      Relation    heapRel;
      BrinOpaque *opaque;
      BlockNumber nblocks;
      BlockNumber heapBlk;
!     int            totalpages = 0;
      FmgrInfo   *consistentFn;
      MemoryContext oldcxt;
      MemoryContext perRangeCxt;
--- 279,287 ----
      Relation    heapRel;
      BrinOpaque *opaque;
      BlockNumber nblocks;
+     int            heap_tuples_per_page;
      BlockNumber heapBlk;
!     int64        totalpages = 0;
      FmgrInfo   *consistentFn;
      MemoryContext oldcxt;
      MemoryContext perRangeCxt;
*************** bringetbitmap(PG_FUNCTION_ARGS)
*** 291,301 ****

      /*
       * We need to know the size of the table so that we know how long to
!      * iterate on the revmap.
       */
      heapOid = IndexGetRelation(RelationGetRelid(idxRel), false);
      heapRel = heap_open(heapOid, AccessShareLock);
      nblocks = RelationGetNumberOfBlocks(heapRel);
      heap_close(heapRel, AccessShareLock);

      /*
--- 292,309 ----

      /*
       * We need to know the size of the table so that we know how long to
!      * iterate on the revmap.  While we have it open, estimate the number of
!      * tuples per heap page for use later.
       */
      heapOid = IndexGetRelation(RelationGetRelid(idxRel), false);
      heapRel = heap_open(heapOid, AccessShareLock);
      nblocks = RelationGetNumberOfBlocks(heapRel);
+     if (heapRel->rd_rel->relpages != 0 && heapRel->rd_rel->reltuples > 0)
+         heap_tuples_per_page = (int)
+             ((double) heapRel->rd_rel->reltuples /
+              (BlockNumber) heapRel->rd_rel->relpages);
+     else    /* if no info, assume 100-byte tuples */
+         heap_tuples_per_page = BLCKSZ / 100;
      heap_close(heapRel, AccessShareLock);

      /*
*************** bringetbitmap(PG_FUNCTION_ARGS)
*** 447,457 ****
          ReleaseBuffer(buf);

      /*
!      * XXX We have an approximation of the number of *pages* that our scan
!      * returns, but we don't have a precise idea of the number of heap tuples
!      * involved.
       */
!     PG_RETURN_INT64(totalpages * 10);
  }

  /*
--- 455,465 ----
          ReleaseBuffer(buf);

      /*
!      * We have an approximation of the number of pages that our scan returns,
!      * but we don't have a precise idea of the number of heap tuples involved.
!      * We have to estimate based on average tuple density.
       */
!     PG_RETURN_INT64(totalpages * heap_tuples_per_page);
  }

  /*

Re: rows estimate in explain analyze for the BRIN index

От
Oleksii Kliukin
Дата:

On 30 Dec 2015, at 17:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleksii Kliukin <alexk@hintbits.com> writes:
On 30 Dec 2015, at 17:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another idea would be to use the heap's row density as calculated
by the last ANALYZE (ie, reltuples/relpages), with a fallback to 100
if relpages=0.  This'd only be convenient if the bitmap scan node has
the parent heap rel open, which it might not.

+1

Any objections to the attached?

Looks good to me. On my sample system with 100K rows, the new version gives me:

— CREATE TABLE test AS SELECT id FROM generate_series(1,100000) id;
— CREATE INDEX ON test USING brin(id);

postgres=# explain analyze select 1 from test where id = 500;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=12.01..16.02 rows=1 width=0) (actual time=0.199..4.220 rows=1 loops=1)
   Recheck Cond: (id = 500)
   Rows Removed by Index Recheck: 28927
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on test_id_idx  (cost=0.00..12.01 rows=1 width=0) (actual time=0.072..0.072 rows=28800 loops=1)
         Index Cond: (id = 500)
 Planning time: 0.433 ms
 Execution time: 4.323 ms
(8 rows)

which is much closer to the actual number of rows removed by the index recheck + the one left.

--
Oleksii

Re: rows estimate in explain analyze for the BRIN index

От
Emre Hasegeli
Дата:
> which is much closer to the actual number of rows removed by the index
> recheck + the one left.

Is it better to be closer?  We are saying those are the "actual"
values not the estimates.  If we cannot provide the actual rows, I
think it is better to provide nothing.  Something closer to the
reality would create more confusion.  Maybe, we just just return the
number of blocks, and put somewhere a note about it.  The actual row
count is already available on the upper part of the plan.

By the way, the estimation is a bigger problem than that.  Please see
my patch [1] about it.

[1] http://www.postgresql.org/message-id/CAE2gYzzJvzPy-1cSgZjJyH69izSa13SEgFC4i4r2z0qBQ2P8Uw@mail.gmail.com



Re: rows estimate in explain analyze for the BRIN index

От
Tom Lane
Дата:
Emre Hasegeli <emre@hasegeli.com> writes:
>> which is much closer to the actual number of rows removed by the index
>> recheck + the one left.

> Is it better to be closer?  We are saying those are the "actual"
> values not the estimates.  If we cannot provide the actual rows, I
> think it is better to provide nothing.

Return zero you mean?  Good point; there's precedent for that elsewhere
in EXPLAIN ANALYZE, IIRC.  If we do what I propose in my patch, it would
possibly just lead to more questions like Oleksii's, because people would
be even more likely to believe that the number is accurate.
        regards, tom lane



Re: rows estimate in explain analyze for the BRIN index

От
Oleksii Kliukin
Дата:
> On 30 Dec 2015, at 18:38, Emre Hasegeli <emre@hasegeli.com> wrote:
>
>> which is much closer to the actual number of rows removed by the index
>> recheck + the one left.
>
> Is it better to be closer?  We are saying those are the "actual"
> values not the estimates.  If we cannot provide the actual rows, I
> think it is better to provide nothing.  Something closer to the
> reality would create more confusion.  Maybe, we just just return the
> number of blocks, and put somewhere a note about it.  The actual row
> count is already available on the upper part of the plan.

I don’t see how to solve this problem without changing explain analyze output to accommodate for “unknown” value. I
don’tthink “0” is a non-confusing representation of “unknown” for most people, and from the practical standpoint, a
“besteffort” estimate is better than 0 (i.e. I will be able to estimate how efficient BRIN index is for my tables in
termsof the number of tuples retrieved/thrown away) 

We might still reflect in the documentation that the BRIN index cannot produce the exact number of rows during the
bitmapscan and point people asking similar questions there. 





Re: rows estimate in explain analyze for the BRIN index

От
Emre Hasegeli
Дата:
> I don’t see how to solve this problem without changing explain analyze output to accommodate for “unknown” value. I
don’tthink “0” is a non-confusing representation of “unknown” for most people, and from the practical standpoint, a
“besteffort” estimate is better than 0 (i.e. I will be able to estimate how efficient BRIN index is for my tables in
termsof the number of tuples retrieved/thrown away) 

The number of retrieved and thrown away rows are already available on
the upper part of the plan.  Bitmap Index Scan should provide the rows
that matched the index.  Another alternative would be just returning
the number of matching pages (by not multiplying with 10).  It might
be better understood.  The users who can understand the EXPLAIN
ANALYZE output shouldn't be expecting BRIN to return rows.



Re: rows estimate in explain analyze for the BRIN index

От
Tom Lane
Дата:
Emre Hasegeli <emre@hasegeli.com> writes:
>> I don’t see how to solve this problem without changing explain analyze output to accommodate for “unknown” value. I
don’tthink “0” is a non-confusing representation of “unknown” for most people, and from the practical standpoint, a
“besteffort” estimate is better than 0 (i.e. I will be able to estimate how efficient BRIN index is for my tables in
termsof the number of tuples retrieved/thrown away)
 

We do already have a nearby precedent for returning zero when we don't
have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
do.  (This is documented btw, at the bottom of section 14.1.)

> The number of retrieved and thrown away rows are already available on
> the upper part of the plan.  Bitmap Index Scan should provide the rows
> that matched the index.

It doesn't have that information.

> Another alternative would be just returning
> the number of matching pages (by not multiplying with 10).  It might
> be better understood.

No, it would not, at least not unless we found a way to explicitly mark
the output as being blocks not rows (which would doubtless break a lot of
existing client-side code).  Zero is fairly clearly an impossible value,
whereas anything that's not zero is going to be taken at face value by
many users.

On balance I think likely the best thing to do is return zero, and
document that behavior.
        regards, tom lane



Re: rows estimate in explain analyze for the BRIN index

От
Oleksii Kliukin
Дата:

On 30 Dec 2015, at 21:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Emre Hasegeli <emre@hasegeli.com> writes:
I don’t see how to solve this problem without changing explain analyze output to accommodate for “unknown” value. I don’t think “0” is a non-confusing representation of “unknown” for most people, and from the practical standpoint, a “best effort” estimate is better than 0 (i.e. I will be able to estimate how efficient BRIN index is for my tables in terms of the number of tuples retrieved/thrown away)

We do already have a nearby precedent for returning zero when we don't
have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
do.  (This is documented btw, at the bottom of section 14.1.)

+1 for following a precedent.


The number of retrieved and thrown away rows are already available on
the upper part of the plan.  Bitmap Index Scan should provide the rows
that matched the index.

It doesn't have that information.

Another alternative would be just returning
the number of matching pages (by not multiplying with 10).  It might
be better understood.

No, it would not, at least not unless we found a way to explicitly mark
the output as being blocks not rows (which would doubtless break a lot of
existing client-side code).  Zero is fairly clearly an impossible value,
whereas anything that's not zero is going to be taken at face value by
many users.

But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows (say, if the value being matched is outside the min/max boundary for every block range?) Granted, if we document that it always returns 0 and should be ignored, then confusing the actual 0 with the 0 as a representation of “unknown” would be less a problem. 

--
Oleksii

Re: rows estimate in explain analyze for the BRIN index

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Emre Hasegeli <emre@hasegeli.com> writes:
> >> I don’t see how to solve this problem without changing explain analyze output to accommodate for “unknown” value.
Idon’t think “0” is a non-confusing representation of “unknown” for most people, and from the practical standpoint, a
“besteffort” estimate is better than 0 (i.e. I will be able to estimate how efficient BRIN index is for my tables in
termsof the number of tuples retrieved/thrown away)
 
> 
> We do already have a nearby precedent for returning zero when we don't
> have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
> do.  (This is documented btw, at the bottom of section 14.1.)

Hmm, but amgetbitmap is documented thusly:
The number of tuples fetched is returned (this might be just anapproximate count, for instance some AMs do not
detectduplicates).http://www.postgresql.org/docs/9.5/static/index-functions.html

so I'm not sure we're actually violating an expectation that the number
of rows is exact.  I mean, if we zero out this one, shouldn't we set it
to zero for these other documented cases too?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: rows estimate in explain analyze for the BRIN index

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> We do already have a nearby precedent for returning zero when we don't
>> have an accurate answer: that's what BitmapAnd and BitmapOr plan nodes
>> do.  (This is documented btw, at the bottom of section 14.1.)

> Hmm, but amgetbitmap is documented thusly:

>     The number of tuples fetched is returned (this might be just an
>     approximate count, for instance some AMs do not detect
>     duplicates).
>     http://www.postgresql.org/docs/9.5/static/index-functions.html

> so I'm not sure we're actually violating an expectation that the number
> of rows is exact.  I mean, if we zero out this one, shouldn't we set it
> to zero for these other documented cases too?

Well, the code comments may say that, but the user-facing docs don't ...

More generally, people are accustomed to the idea that the planner's
estimated rowcounts are just estimates, but they're not accustomed to
the idea that the runtime "actual" rowcounts might be just estimates.
That seems like it's moving EXPLAIN ANALYZE's contract quite a bit,
even if there are some documents for internal APIs that say something
else.
        regards, tom lane



Re: rows estimate in explain analyze for the BRIN index

От
Emre Hasegeli
Дата:
> But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows
> (say, if the value being matched is outside the min/max boundary for every
> block range?) Granted, if we document that it always returns 0 and should be
> ignored, then confusing the actual 0 with the 0 as a representation of
> “unknown” would be less a problem.

How about -1 ?  It is an impossible value for sure.  Maybe we should
change BitmapAnd and BitmapOr nodes, too.  It is better to make it
obvious that it is not the correct value.  I don't think many people
would notice the note on the documentation.

On the other hand, returning -1 broke parser of explain.depesz.com [1].

[1] http://explain.depesz.com/s/tAkd