Обсуждение: Improve output of BitmapAnd EXPLAIN ANALYZE
A customer just pinged me wondering how it was that a BitmapAnd node was reporting 0 tuples when the Bitmap Heap Scan above it showed it had in fact generated tuples. While this is mentioned in the docs, I think it would be very helpful to have ANALYZE spit out "N/A" instead of 0 for these nodes. AFAICT that would just require adding a special case to the "if (es->costs)" block at line ~1204 in explain.c? BTW, it looks like it would actually be possible to return a real row-count if none of the TIDBitmap pages are chunks, but I'm not sure if it's worth the extra effort. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> A customer just pinged me wondering how it was that a BitmapAnd node was 
> reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
> fact generated tuples.
> While this is mentioned in the docs, I think it would be very helpful to 
> have ANALYZE spit out "N/A" instead of 0 for these nodes.
That would break code that tries to parse that stuff, eg depesz.com.
        regards, tom lane
			
		Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > > A customer just pinged me wondering how it was that a BitmapAnd node was > > reporting 0 tuples when the Bitmap Heap Scan above it showed it had in > > fact generated tuples. > > > While this is mentioned in the docs, I think it would be very helpful to > > have ANALYZE spit out "N/A" instead of 0 for these nodes. > > That would break code that tries to parse that stuff, eg depesz.com. I don't believe Jim was suggesting that we back-patch such a change. Changing it in a new major release seems entirely reasonable. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> That would break code that tries to parse that stuff, eg depesz.com.
> I don't believe Jim was suggesting that we back-patch such a change.
I don't either.
> Changing it in a new major release seems entirely reasonable.
It's still a crock though.  I wonder whether it wouldn't be better to
change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
it expends extra effort to try to produce a rowcount number.
We could certainly run through the result bitmap and count the number
of exact-TID bits.  I don't see a practical way of doing something
with lossy page bits, but maybe those occur infrequently enough
that we could ignore them?  Or we could arbitrarily decide that
a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
less arbitrarily, count it as the relation's average number
of tuples per page.
        regards, tom lane
			
		* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Changing it in a new major release seems entirely reasonable. > > It's still a crock though. I wonder whether it wouldn't be better to > change the nodeBitmap code so that when EXPLAIN ANALYZE is active, > it expends extra effort to try to produce a rowcount number. I'm certainly all for doing something better, just didn't think that we should be worried about making a change to the EXPLAIN ANALYZE output in a major release because Depesz might have to update the explain site. > We could certainly run through the result bitmap and count the number > of exact-TID bits. I don't see a practical way of doing something > with lossy page bits, but maybe those occur infrequently enough > that we could ignore them? Or we could arbitrarily decide that > a lossy page should be counted as MaxHeapTuplesPerPage, or a bit > less arbitrarily, count it as the relation's average number > of tuples per page. Counting each page as the relation's average number of tuples per page seems entirely reasonable to me, for what that is trying to report. That said, I'm a big fan of how we have more detail for things like a HashJoin (buckets, batches, memory usage) and it might be nice to have more information like that for a BitmapAnd (and friends). In particular, I'm thinking of memory usage, exact vs. lossy pages, etc. Knowing that the bitmap has gotten to the point of being lossy might indicate that a user could up work_mem, for example, and possibly avoid recheck costs. Thanks! Stephen
On 10/21/16 8:21 AM, Stephen Frost wrote: > Counting each page as the relation's average number of tuples per page > seems entirely reasonable to me, for what that is trying to report. My concern is that still leaves a lot of room for confusion when interpreting EXPLAIN ANALYZE. Every other node will tell you exactly what happened and it's pretty easy to reason about whether rows should have gone up or down based on the type of node. You can't do that for Bitmap(And|Or) unless you know the details of how TIDBitmaps work. Reporting N/A makes it crystal clear that these nodes operate very differently than all the others. (On a related note, it would also be nice if we reported fractional rows when the row count low and loops is high.) > That said, I'm a big fan of how we have more detail for things like a > HashJoin (buckets, batches, memory usage) and it might be nice to have > more information like that for a BitmapAnd (and friends). In > particular, I'm thinking of memory usage, exact vs. lossy pages, etc. > Knowing that the bitmap has gotten to the point of being lossy might > indicate that a user could up work_mem, for example, and possibly avoid > recheck costs. I think that's the best way to handle this: report N/A in the header and then provide details on exact vs lossy. That provides a clear indication to users that these kinds of nodes are special, as well as a reminder as to why they're special. Certainly the node could report an exact rowcount in the header if there were no lossy pages too. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote: > On 10/21/16 8:21 AM, Stephen Frost wrote: > >Counting each page as the relation's average number of tuples per page > >seems entirely reasonable to me, for what that is trying to report. > > My concern is that still leaves a lot of room for confusion when > interpreting EXPLAIN ANALYZE. Every other node will tell you exactly > what happened and it's pretty easy to reason about whether rows > should have gone up or down based on the type of node. You can't do > that for Bitmap(And|Or) unless you know the details of how > TIDBitmaps work. Reporting N/A makes it crystal clear that these > nodes operate very differently than all the others. I don't see why you think the numbers reported by BitmapAnd based on this approach wouldn't go up and down in a similar manner to what you would expect to get, based on that node type. Reporting N/A is entirely punting on it when we have perfectly useful information that can be reported. > (On a related note, it would also be nice if we reported fractional > rows when the row count low and loops is high.) I can certainly understand that, though I think I'd rather have an actual 'total' value or similar instead, but that's really a different discussion. > >That said, I'm a big fan of how we have more detail for things like a > >HashJoin (buckets, batches, memory usage) and it might be nice to have > >more information like that for a BitmapAnd (and friends). In > >particular, I'm thinking of memory usage, exact vs. lossy pages, etc. > >Knowing that the bitmap has gotten to the point of being lossy might > >indicate that a user could up work_mem, for example, and possibly avoid > >recheck costs. > > I think that's the best way to handle this: report N/A in the header > and then provide details on exact vs lossy. That provides a clear > indication to users that these kinds of nodes are special, as well > as a reminder as to why they're special. Certainly the node could > report an exact rowcount in the header if there were no lossy pages > too. I don't see why we would want to stick 'N/A' in for the header, even if we are reporting the details, when we can provide a pretty reasonable number. In particular, I certainly don't think we would want to report N/A sometimes (lossy case) and then an actual number other times (all exact case). That strikes me as much more likely to be confusing. Thanks! Stephen
On 10/21/16 12:30 PM, Stephen Frost wrote: > I don't see why we would want to stick 'N/A' in for the header, even if > we are reporting the details, when we can provide a pretty reasonable > number. Because then it's absolutely clear that we don't have a valid rowcount, only a guess (and a guess that's potentially off by a lot). No one is used to seeing "N/A" in explain, so when they do see it they'll immediately realize they don't know what's going on and hit google or the docs up. Otherwise they'll just think it's an accurate rowcount like for any other node... > In particular, I certainly don't think we would want to report > N/A sometimes (lossy case) and then an actual number other times (all > exact case). That strikes me as much more likely to be confusing. Fair enough. I'd certainly rather have a constant N/A then a guess at the rowcount. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
The BRIN Bitmap Index Scan has the same problem. I have seen people confused by this. I think N/A would clearly improve the situation.
On Mon, Oct 31, 2016 at 6:56 AM, Emre Hasegeli <emre@hasegeli.com> wrote: > The BRIN Bitmap Index Scan has the same problem. I have seen people > confused by this. I think N/A would clearly improve the situation. I agree. Or perhaps better still, leave rows=%.0f out altogether when we don't have a meaningful value to report. If it were OK to use some unimportant-looking value as a proxy for "undefined", the SQL standard wouldn't include nulls. I don't like Tom's proposal of trying to fake up a value here when EXPLAIN ANALYZE is in use. Reporting "exact" and "lossy" values for BitmapAnd would be a fine enhancement, but artificially trying to flatten that back into a row count is going to be confusing, not helpful. (Just last week I saw a case where the fact that many pages were being lossified caused a performance problem ... so treating lossy pages as if they don't exist would have led to a lot of head-scratching, because under Tom's proposal the row count would have been way off.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> I don't like Tom's proposal of trying to fake up a value here when
> EXPLAIN ANALYZE is in use.  Reporting "exact" and "lossy" values for
> BitmapAnd would be a fine enhancement, but artificially trying to
> flatten that back into a row count is going to be confusing, not
> helpful.  (Just last week I saw a case where the fact that many pages
> were being lossified caused a performance problem ... so treating
> lossy pages as if they don't exist would have led to a lot of
> head-scratching, because under Tom's proposal the row count would have
> been way off.)
It would very often be the case that the value I suggested would be exact,
so this complaint seems off-base to me.
If we were willing to add an additional output line, we could also report
the number of lossy pages in the result bitmap, and people would then
know not to trust the reported rowcount as gospel.  But it's still useful
to have it.  I'm envisioning output like
  ->  BitmapOr  (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1)
in the no-lossy-pages case, otherwise
  ->  BitmapOr  (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1)        Lossy Bitmap: exact
entries=2469,lossy pages=123 
There's nothing misleading about that, IMO.  (Exercise for the reader:
what rows/page estimate did I assume?)
        regards, tom lane
			
		On Tue, Nov 1, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't like Tom's proposal of trying to fake up a value here when >> EXPLAIN ANALYZE is in use. Reporting "exact" and "lossy" values for >> BitmapAnd would be a fine enhancement, but artificially trying to >> flatten that back into a row count is going to be confusing, not >> helpful. (Just last week I saw a case where the fact that many pages >> were being lossified caused a performance problem ... so treating >> lossy pages as if they don't exist would have led to a lot of >> head-scratching, because under Tom's proposal the row count would have >> been way off.) > > It would very often be the case that the value I suggested would be exact, > so this complaint seems off-base to me. From my point of view, something that very often gives the right answers isn't acceptable. We certainly wouldn't accept a query optimization that very often gives the right answers. It's gotta always give the right answer. > If we were willing to add an additional output line, we could also report > the number of lossy pages in the result bitmap, and people would then > know not to trust the reported rowcount as gospel. But it's still useful > to have it. I'm envisioning output like > > -> BitmapOr (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1) > > in the no-lossy-pages case, otherwise > > -> BitmapOr (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1) > Lossy Bitmap: exact entries=2469, lossy pages=123 > > There's nothing misleading about that, IMO. (Exercise for the reader: > what rows/page estimate did I assume?) (4000-2469)/123 = 12.44715 ? I think it's inherently misleading to report values that were concocted specifically for EXPLAIN ANALYZE. Things that we report there should have some underlying reality or relevance. People - including me - tend to assume they do, and you don't want to spend time chasing down something that's PURELY an EXPLAIN ANALYZE artifact with no actual relevance to the runtime behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company