number of rows estimation for bit-AND operation

От: Slava Moudry
Тема: number of rows estimation for bit-AND operation
Дата: ,
Msg-id: 622F69662CFE9F4182958973F99F3F1515102D3DD6@EXVMBX017-12.exch017.msoutlookonline.net
(см: обсуждение, исходный текст)
Ответы: Re: number of rows estimation for bit-AND operation  (Scott Marlowe)
Список: pgsql-performance

Скрыть дерево обсуждения

number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Robert Haas, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
   Re: number of rows estimation for bit-AND operation  (Robert Haas, )
    Re: number of rows estimation for bit-AND operation  (Alvaro Herrera, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )

Hi,

I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.

 

For most records – the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):

Most common Values: {0,128,2,4,8)

Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

 

What I notice that when bit-AND function is used – Postgres significantly underestimates the amount of rows:

 

 

explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;

                                                         QUERY PLAN                                                         

-----------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)

   ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)

         Filter: ((mt_flags & 8) = 0)

 Total runtime: 2883.191 ms

(4 rows)

 

This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.

Is there any way to help Postgres make a better estimation for number of rows returned by bit function?

Thanks,

-Slava Moudry, Senior DW Engineer. 4Info Inc.

 

P.S. table definition:

 

\d mt__20090801

                      Table "dw.mt__20090801"

          Column          |            Type             | Modifiers

--------------------------+-----------------------------+-----------

 mt_id                    | bigint                      | not null

 mt_ts                    | timestamp without time zone |

 ad_cost                  | numeric(10,5)               |

 short_code               | integer                     |

 message_id               | bigint                      | not null

 mp_code                  | character(1)                | not null

 al_id                    | integer                     | not null

 cust_id                  | integer                     |

 device_id                | integer                     | not null

 broker_id                | smallint                    |

 partner_id               | integer                     |

 ad_id                    | integer                     |

 keyword_id               | integer                     |

 sc_id                    | integer                     |

 cp_id                    | integer                     |

 src_alertlog_id          | bigint                      |

 src_query_id             | bigint                      |

 src_response_message_num | smallint                    |

 src_gateway_message_id   | bigint                      |

 mt_flags                 | integer                     |

 message_length           | integer                     | not null

 created_etl              | timestamp without time zone |

Indexes:

    "mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"

    "mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"

Check constraints:

    "mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time

zone)

Inherits: mt

Tablespace: "dw_tables3"

 


В списке pgsql-performance по дате сообщения:

От: Scott Carey
Дата:
Сообщение: Re: Memory reporting on CentOS Linux
От: Russell Smith
Дата:
Сообщение: Re: Getting time of a postgresql-request