Re: number of rows estimation for bit-AND operation

От: Slava Moudry
Тема: Re: number of rows estimation for bit-AND operation
Дата: ,
Msg-id: 622F69662CFE9F4182958973F99F3F15151033CBD0@EXVMBX017-12.exch017.msoutlookonline.net
(см: обсуждение, исходный текст)
Ответ на: Re: number of rows estimation for bit-AND operation  (Scott Marlowe)
Список: pgsql-performance

Скрыть дерево обсуждения

number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Robert Haas, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
   Re: number of rows estimation for bit-AND operation  (Robert Haas, )
    Re: number of rows estimation for bit-AND operation  (Alvaro Herrera, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )

> increase default stats target, analyze, try again.
This field has only 5 values. I had put values/frequencies in my first post.
Based on the values (see below) - there is no reason for planner to think that mt_flags&134=0 should return 12200 rows.
select mt_flags, count(*) from staging.tmp_t group by 1;
 mt_flags |  count
----------+---------
      128 |   57362
        4 |    1371
        8 |     627
        2 |   19072
        0 | 2361630
(5 rows)

In fact, if I rewrite the query using value matching - the estimations are right on:
explain analyze select count(*) from staging.tmp_t where mt_flags not in (128,2,4);
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85878.63..85878.64 rows=1 width=0) (actual time=2904.005..2904.005 rows=1 loops=1)
   ->  Seq Scan on tmp_t  (cost=0.00..79973.85 rows=**2361910** width=0) (actual time=0.008..2263.983 rows=2362257
loops=1)
         Filter: (mt_flags <> ALL ('{128,2,4}'::integer[]))
 Total runtime: 2904.038 ms
(4 rows)


Anyways, I've been using statistics target of 100 in 8.3 and in 8.4 100 is default. I am currently using
default_statistics_target=1000.

Do you think that bit-and function might be skewing the statistics for execution plan somehow?
Thanks,
-Slava.

-----Original Message-----
From: Scott Marlowe [mailto:]
Sent: Tuesday, August 18, 2009 2:58 PM
To: Slava Moudry
Cc: 
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

2009/8/18 Slava Moudry <>:
> Hi Scott,
> Thank you for reply.
> I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your
advice:

Yeah, you're returning most of the rows, so a seq scan makes sense.
Try indexing / matching on something more uncommon and you should get
an index scan.



> The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
> It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below
itis again 200x off on number of rows. 

increase default stats target, analyze, try again.


> explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
>   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
>         Filter: ((mt_flags & 134) = 0)
>  Total runtime: 2965.009 ms
> (4 rows)
>
> I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x
 (nowusing 8.4.0). 
> We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing
thisin reporting applications. 

Looks more like a low stats target.  Try increasing that first.


В списке pgsql-performance по дате сообщения:

От: Karl Denninger
Дата:
Сообщение: Re: [SQL] SQL Query Performance - what gives?
От: Pierre Frédéric Caillaud
Дата:
Сообщение: Re: [SQL] SQL Query Performance - what gives?