Re: number of rows estimation for bit-AND operation

От: Slava Moudry
Тема: Re: number of rows estimation for bit-AND operation
Дата: ,
Msg-id: 622F69662CFE9F4182958973F99F3F151515E74E32@EXVMBX017-12.exch017.msoutlookonline.net
(см: обсуждение, исходный текст)
Ответ на: Re: number of rows estimation for bit-AND operation  (Robert Haas)
Список: pgsql-performance

Скрыть дерево обсуждения

number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Robert Haas, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )
 Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
  Re: number of rows estimation for bit-AND operation  (Scott Marlowe, )
   Re: number of rows estimation for bit-AND operation  (Robert Haas, )
    Re: number of rows estimation for bit-AND operation  (Alvaro Herrera, )
   Re: number of rows estimation for bit-AND operation  (Slava Moudry, )

Hi,
Yes, I thought about putting the bit-flags in separate fields.
Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 
For users it's also much easier to write "where mt_flags&134=0" instead of "where f_2=false and f4=false and
f_128=false".
In Teradata - that worked just fine, but it costs millions vs. zero cost for Postgres, so I am not really complaining
outloud :) 

Hopefully Tom or other bright folks at PG could take a look at this for the next patch/release.
Btw, can you send me the link to " PG's selectivity estimator" discussion - I'd like to provide feedback if I can.
Thanks,
-Slava.


-----Original Message-----
From: Robert Haas [mailto:]
Sent: Thursday, August 20, 2009 10:55 AM
To: Scott Marlowe
Cc: Slava Moudry; 
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<> wrote:
> 2009/8/18 Slava Moudry <>:
>>> increase default stats target, analyze, try again.
>> This field has only 5 values. I had put values/frequencies in my first post.
>
> Sorry, kinda missed that.  Anyway, there's no way for pg to know which
> operation is gonna match.  Without an index on it.  So my guess is
> that it just guesses some fixed value.  With an index it might be able
> to get it right, but you'll need an index for each type of match
> you're looking for.  I think.  Maybe someone else on the list has a
> better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert


В списке pgsql-performance по дате сообщения:

От: Jeff Janes
Дата:
Сообщение: [PERFORMANCE] how to set wal_buffers
От: gael@pilotsystems.net (Gaël Le Mignot)
Дата:
Сообщение: Performance regression between 8.3 and 8.4 on heavy text indexing