Обсуждение: number of rows estimation for bit-AND operation

От:
Slava Moudry
Дата:

Hi,

I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.

 

For most records – the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):

Most common Values: {0,128,2,4,8)

Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

 

What I notice that when bit-AND function is used – Postgres significantly underestimates the amount of rows:

 

 

explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;

                                                         QUERY PLAN                                                         

-----------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)

   ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)

         Filter: ((mt_flags & 8) = 0)

 Total runtime: 2883.191 ms

(4 rows)

 

This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.

Is there any way to help Postgres make a better estimation for number of rows returned by bit function?

Thanks,

-Slava Moudry, Senior DW Engineer. 4Info Inc.

 

P.S. table definition:

 

\d mt__20090801

                      Table "dw.mt__20090801"

          Column          |            Type             | Modifiers

--------------------------+-----------------------------+-----------

 mt_id                    | bigint                      | not null

 mt_ts                    | timestamp without time zone |

 ad_cost                  | numeric(10,5)               |

 short_code               | integer                     |

 message_id               | bigint                      | not null

 mp_code                  | character(1)                | not null

 al_id                    | integer                     | not null

 cust_id                  | integer                     |

 device_id                | integer                     | not null

 broker_id                | smallint                    |

 partner_id               | integer                     |

 ad_id                    | integer                     |

 keyword_id               | integer                     |

 sc_id                    | integer                     |

 cp_id                    | integer                     |

 src_alertlog_id          | bigint                      |

 src_query_id             | bigint                      |

 src_response_message_num | smallint                    |

 src_gateway_message_id   | bigint                      |

 mt_flags                 | integer                     |

 message_length           | integer                     | not null

 created_etl              | timestamp without time zone |

Indexes:

    "mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"

    "mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"

Check constraints:

    "mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time

zone)

Inherits: mt

Tablespace: "dw_tables3"

 

От:
Scott Marlowe
Дата:

On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<> wrote:
> Hi,
>
> I am using int8 field to pack a number of error flags. This is very common
> technique for large tables to pack multiple flags in one integer field.
>
> For most records – the mt_flags field is 0. Here is the statistics (taken
> from pgAdmin Statistics tab for mt_flags column):
>
> Most common Values: {0,128,2,4,8)
>
> Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
>
> What I notice that when bit-AND function is used – Postgres significantly
> underestimates the amount of rows:
>
> explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
>
>                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual
> time=2883.154..2883.154 rows=1 loops=1)
>
>    ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0)
> (actual time=0.008..2100.390 rows=2439435 loops=1)
>
>          Filter: ((mt_flags & 8) = 0)
>
>  Total runtime: 2883.191 ms
>
> (4 rows)
>
> This is not an issue for the particular query above, but I noticed that due
> to that miscalculation in many cases Postgres chooses plan with Nested Loops
> for other queries. I can fix it by setting enable_nest_loops to off, but
> it's not something I should set for all queries.
>
> Is there any way to help Postgres make a better estimation for number of
> rows returned by bit function?

You can index on the function.  For instance:

create table t (mt_flags int);
create index t_mtflags_bit on t ((mt_flags&8));
insert into t select case when random() > 0.95 then case when random()
>0.5 then 8 else 12 end else 0 end from generate_series(1,10000);
analyze t;
explain select * from t where mt_flags&8=8;
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using t_mtflags_bit on t  (cost=0.00..52.17 rows=467 width=4)
   Index Cond: ((mt_flags & 8) = 8)
(2 rows)

Hope that helps a little.

От:
Scott Marlowe
Дата:

2009/8/18 Slava Moudry <>:
> Hi Scott,
> Thank you for reply.
> I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your
advice:

Yeah, you're returning most of the rows, so a seq scan makes sense.
Try indexing / matching on something more uncommon and you should get
an index scan.



> The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
> It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below
itis again 200x off on number of rows. 

increase default stats target, analyze, try again.


> explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
>   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
>         Filter: ((mt_flags & 134) = 0)
>  Total runtime: 2965.009 ms
> (4 rows)
>
> I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x
 (nowusing 8.4.0). 
> We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing
thisin reporting applications. 

Looks more like a low stats target.  Try increasing that first.

От:
Scott Marlowe
Дата:

2009/8/18 Slava Moudry <>:
>> increase default stats target, analyze, try again.
> This field has only 5 values. I had put values/frequencies in my first post.

Sorry, kinda missed that.  Anyway, there's no way for pg to know which
operation is gonna match.  Without an index on it.  So my guess is
that it just guesses some fixed value.  With an index it might be able
to get it right, but you'll need an index for each type of match
you're looking for.  I think.  Maybe someone else on the list has a
better idea.

От:
Slava Moudry
Дата:

Hi Scott,
Thank you for reply.
I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your
advice:
create index t_mtflags_bit on staging.tmp_t ((mt_flags&8));
analyze staging.tmp_t;
explain analyze select count(*) from staging.tmp_t where  mt_flags&8=0;
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=89122.78..89122.79 rows=1 width=0) (actual time=2994.970..2994.971 rows=1 loops=1)
   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=2439541 width=0) (actual time=0.012..2161.886 rows=2439435 loops=1)
         Filter: ((mt_flags & 8) = 0)
 Total runtime: 2995.017 ms
(4 rows)

The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below it
isagain 200x off on number of rows. 
explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
         Filter: ((mt_flags & 134) = 0)
 Total runtime: 2965.009 ms
(4 rows)

I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x
(nowusing 8.4.0). 
We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing this
inreporting applications. 
Thanks,
-Slava Moudry.


-----Original Message-----
From: Scott Marlowe [mailto:]
Sent: Tuesday, August 18, 2009 12:09 AM
To: Slava Moudry
Cc: 
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<> wrote:
> Hi,
>
> I am using int8 field to pack a number of error flags. This is very common
> technique for large tables to pack multiple flags in one integer field.
>
> For most records - the mt_flags field is 0. Here is the statistics (taken
> from pgAdmin Statistics tab for mt_flags column):
>
> Most common Values: {0,128,2,4,8)
>
> Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
>
> What I notice that when bit-AND function is used - Postgres significantly
> underestimates the amount of rows:
>
> explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
>
>                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual
> time=2883.154..2883.154 rows=1 loops=1)
>
>    ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0)
> (actual time=0.008..2100.390 rows=2439435 loops=1)
>
>          Filter: ((mt_flags & 8) = 0)
>
>  Total runtime: 2883.191 ms
>
> (4 rows)
>
> This is not an issue for the particular query above, but I noticed that due
> to that miscalculation in many cases Postgres chooses plan with Nested Loops
> for other queries. I can fix it by setting enable_nest_loops to off, but
> it's not something I should set for all queries.
>
> Is there any way to help Postgres make a better estimation for number of
> rows returned by bit function?

You can index on the function.  For instance:

create table t (mt_flags int);
create index t_mtflags_bit on t ((mt_flags&8));
insert into t select case when random() > 0.95 then case when random()
>0.5 then 8 else 12 end else 0 end from generate_series(1,10000);
analyze t;
explain select * from t where mt_flags&8=8;
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using t_mtflags_bit on t  (cost=0.00..52.17 rows=467 width=4)
   Index Cond: ((mt_flags & 8) = 8)
(2 rows)

Hope that helps a little.

От:
Slava Moudry
Дата:

> increase default stats target, analyze, try again.
This field has only 5 values. I had put values/frequencies in my first post.
Based on the values (see below) - there is no reason for planner to think that mt_flags&134=0 should return 12200 rows.
select mt_flags, count(*) from staging.tmp_t group by 1;
 mt_flags |  count
----------+---------
      128 |   57362
        4 |    1371
        8 |     627
        2 |   19072
        0 | 2361630
(5 rows)

In fact, if I rewrite the query using value matching - the estimations are right on:
explain analyze select count(*) from staging.tmp_t where mt_flags not in (128,2,4);
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85878.63..85878.64 rows=1 width=0) (actual time=2904.005..2904.005 rows=1 loops=1)
   ->  Seq Scan on tmp_t  (cost=0.00..79973.85 rows=**2361910** width=0) (actual time=0.008..2263.983 rows=2362257
loops=1)
         Filter: (mt_flags <> ALL ('{128,2,4}'::integer[]))
 Total runtime: 2904.038 ms
(4 rows)


Anyways, I've been using statistics target of 100 in 8.3 and in 8.4 100 is default. I am currently using
default_statistics_target=1000.

Do you think that bit-and function might be skewing the statistics for execution plan somehow?
Thanks,
-Slava.

-----Original Message-----
From: Scott Marlowe [mailto:]
Sent: Tuesday, August 18, 2009 2:58 PM
To: Slava Moudry
Cc: 
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

2009/8/18 Slava Moudry <>:
> Hi Scott,
> Thank you for reply.
> I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your
advice:

Yeah, you're returning most of the rows, so a seq scan makes sense.
Try indexing / matching on something more uncommon and you should get
an index scan.



> The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
> It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below
itis again 200x off on number of rows. 

increase default stats target, analyze, try again.


> explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
>   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
>         Filter: ((mt_flags & 134) = 0)
>  Total runtime: 2965.009 ms
> (4 rows)
>
> I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x
 (nowusing 8.4.0). 
> We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing
thisin reporting applications. 

Looks more like a low stats target.  Try increasing that first.

От:
Robert Haas
Дата:

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<> wrote:
> 2009/8/18 Slava Moudry <>:
>>> increase default stats target, analyze, try again.
>> This field has only 5 values. I had put values/frequencies in my first post.
>
> Sorry, kinda missed that.  Anyway, there's no way for pg to know which
> operation is gonna match.  Without an index on it.  So my guess is
> that it just guesses some fixed value.  With an index it might be able
> to get it right, but you'll need an index for each type of match
> you're looking for.  I think.  Maybe someone else on the list has a
> better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

От:
Scott Marlowe
Дата:

2009/8/20 Slava Moudry <>:
> Hi,
> Yes, I thought about putting the bit-flags in separate fields.
> Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 

For giggles I created two test tables, one with a single int, one with
8 bools, and put 100M entries in each.  The table with 8 bools took up
aprrox. 3560616 bytes, while the one with a single int took up approx.
3544212

I.e they're about the same.  You should really test to see if having a
lot of bools costs more than mangling ints around.  I'm guessing I
could fit a lot more bools in the test table due to alignment issues
than just 8.

От:
Scott Marlowe
Дата:

On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<> wrote:
> 2009/8/20 Slava Moudry <>:
>> Hi,
>> Yes, I thought about putting the bit-flags in separate fields.
>> Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 
>
> For giggles I created two test tables, one with a single int, one with
> 8 bools, and put 100M entries in each.  The table with 8 bools took up
> aprrox. 3560616 bytes, while the one with a single int took up approx.
> 3544212
>
> I.e they're about the same.  You should really test to see if having a
> lot of bools costs more than mangling ints around.  I'm guessing I
> could fit a lot more bools in the test table due to alignment issues
> than just 8.

So, I made a table with 26 bool fields, and added 100M rows to it, and
that table took up about 5906028 bytes.  So yea, the storage is
greater for boolean fields, but only if they aren't null.  making them
null would save a lot of space, so if null bits fit your model, then
it might be worth looking into.  Certainly they're not so much bigger
as to be unmanageable.

От:
Slava Moudry
Дата:

Hi,
Yes, I thought about putting the bit-flags in separate fields.
Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 
For users it's also much easier to write "where mt_flags&134=0" instead of "where f_2=false and f4=false and
f_128=false".
In Teradata - that worked just fine, but it costs millions vs. zero cost for Postgres, so I am not really complaining
outloud :) 

Hopefully Tom or other bright folks at PG could take a look at this for the next patch/release.
Btw, can you send me the link to " PG's selectivity estimator" discussion - I'd like to provide feedback if I can.
Thanks,
-Slava.


-----Original Message-----
From: Robert Haas [mailto:]
Sent: Thursday, August 20, 2009 10:55 AM
To: Scott Marlowe
Cc: Slava Moudry; 
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<> wrote:
> 2009/8/18 Slava Moudry <>:
>>> increase default stats target, analyze, try again.
>> This field has only 5 values. I had put values/frequencies in my first post.
>
> Sorry, kinda missed that.  Anyway, there's no way for pg to know which
> operation is gonna match.  Without an index on it.  So my guess is
> that it just guesses some fixed value.  With an index it might be able
> to get it right, but you'll need an index for each type of match
> you're looking for.  I think.  Maybe someone else on the list has a
> better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

От:
Robert Haas
Дата:

On Thu, Aug 20, 2009 at 9:58 PM, Scott Marlowe<> wrote:
> On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<> wrote:
>> 2009/8/20 Slava Moudry <>:
>>> Hi,
>>> Yes, I thought about putting the bit-flags in separate fields.
>>> Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 
>>
>> For giggles I created two test tables, one with a single int, one with
>> 8 bools, and put 100M entries in each.  The table with 8 bools took up
>> aprrox. 3560616 bytes, while the one with a single int took up approx.
>> 3544212
>>
>> I.e they're about the same.  You should really test to see if having a
>> lot of bools costs more than mangling ints around.  I'm guessing I
>> could fit a lot more bools in the test table due to alignment issues
>> than just 8.
>
> So, I made a table with 26 bool fields, and added 100M rows to it, and
> that table took up about 5906028 bytes.  So yea, the storage is
> greater for boolean fields, but only if they aren't null.  making them
> null would save a lot of space, so if null bits fit your model, then
> it might be worth looking into.  Certainly they're not so much bigger
> as to be unmanageable.

This is a clever idea.  Tables with any non-null columns have a null
bitmap with 1 bit per field, followed by the actual values of the
non-null fields.  So if the OP arranges to use true and null as the
values instead of true and false, and uses null for the flag value
that is most often wanted, it will pack down pretty tight.

Scott, did you check whether a toast table got created here and what
the size of it was?

...Robert

От:
Alvaro Herrera
Дата:

Robert Haas escribió:

> Scott, did you check whether a toast table got created here and what
> the size of it was?

A table with only bool columns (and, say, one int8 column) would not
have a toast table.  Only varlena columns produce toast tables.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

От:
Slava Moudry
Дата:

Hi,
Sorry I don't understand how the numbers came so low.
If you assume that 8 boolean fields take 1 byte each, so for 100M table it
will be 800M bytes.
How did your table fit in 3560616 bytes ?

Using postgres 8.4.0 on Linux x64:
create table staging.tmp_t1(a1 boolean, a2 boolean, a3 boolean, a4 boolean,
a5 boolean, a6 boolean, a7 boolean, a8 boolean) tablespace stage3;
insert into staging.tmp_t1 select
1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean
from generate_series(1,100000000);
select pg_total_relation_size('staging.tmp_t1');
 pg_total_relation_size
------------------------
             3,625,689,088
(1 row)
The table with 16 booleans took just 766MB more, so the growth appears to be
non-linear.
Most likely Postgres does some compression.. I can't tell for sure without
looking into source code.

Anyway, given that int8 can accomodate 64 flags - the space saving can be
substantial.
Thanks,
-Slava.



----- Original Message -----
From: "Scott Marlowe" <>
To: "Slava Moudry" <>
Cc: "Robert Haas" <>;
<>
Sent: Thursday, August 20, 2009 6:58 PM
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation


On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<>
wrote:
> 2009/8/20 Slava Moudry <>:
>> Hi,
>> Yes, I thought about putting the bit-flags in separate fields.
>> Unfortunately - I expect to have quite a lot of these and space is an
>> issue when you are dealing with billions of records in fact table, so I
>> prefer to pack them into one int8.
>
> For giggles I created two test tables, one with a single int, one with
> 8 bools, and put 100M entries in each. The table with 8 bools took up
> aprrox. 3560616 bytes, while the one with a single int took up approx.
> 3544212
>
> I.e they're about the same. You should really test to see if having a
> lot of bools costs more than mangling ints around. I'm guessing I
> could fit a lot more bools in the test table due to alignment issues
> than just 8.

So, I made a table with 26 bool fields, and added 100M rows to it, and
that table took up about 5906028 bytes.  So yea, the storage is
greater for boolean fields, but only if they aren't null.  making them
null would save a lot of space, so if null bits fit your model, then
it might be worth looking into.  Certainly they're not so much bigger
as to be unmanageable.