Обсуждение: Storing and querying boolean fields

Поиск
Список
Период
Сортировка

Storing and querying boolean fields

От
"Richard Brown"
Дата:

Hi All,

First, some background:

- We are using PostgreSQL 7.3.4, and am locked into this version. I would upgrade if I could, but the decision is not mine.

- The table referred to below is 120+ million rows, and has a width of 27 columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text field, and 1 double precision). This table already has 3 multi-field indexes, as well as a primary key index.

I need to add many boolean flags to my database that need to be retrieved by primary key, as well as be searched upon with decent speed. By "many", I mean about 30 booleans total. The added bonus is that some of these values need to be searched in an OR manner, while others need to be searched as an AND. For example, I will need to do a search that looks something like:   select * from table where A && B && C && (D || E) && F

Our first thought was to "pack" these booleans into integer fields to save space and help with searching (so we thought). The design calls for a smallint field and an integer field to be added to the table. UPDATE/INSERT would be handled by calculating the values of the two fields outside of the database. Searching would be accomplished by taking the bits that were requested, creating a bitmask corresponding to the 2 fields, and then doing something like:  select * from table where (field & BITMASK) = BITMASK.  We felt that putting this information in this table rather than a separate one would help performance by eliminating a JOIN, and it was assumed (ugh) that if we built an index on those two fields, that we would be able to use those indexes for searching in the aforementioned manner.

Unfortunately we have come up with the following problems with this approach:
1. Postgres will only use 1 index per table, so putting these fields in the same table as several other searchable fields actually hurts us more than helps.

2. We haven't been able to get any index scans in preliminary testing (all seq. scans).

After all of that background, my actual question is:  what is the best way to effectively store and query this type of data? It seems more and more that our initial design is just flat-out wrong, but we are at a loss as to what the other options are.

Thanks very much in advance, and I apologize for the length of the message, but I felt background info would be important.

Regards,
Richard

Re: Storing and querying boolean fields

От
Tom Lane
Дата:
"Richard Brown" <rbrown@openratings.com> writes:
> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

They won't even let you update to 7.3.something-reasonably-current ?

Resign.  Go find a job with a boss whose IQ is above room temperature.
You want to be out of there before the finger-pointing starts, which
it certainly will when one of the many fixed-since-7.3.4 bugs eats
your database.

> I need to add many boolean flags to my database that need to be retrieved by
> primary key, as well as be searched upon with decent speed. By "many", I
> mean about 30 booleans total. The added bonus is that some of these values
> need to be searched in an OR manner, while others need to be searched as an
> AND. For example, I will need to do a search that looks something like:
> select * from table where A && B && C && (D || E) && F

As for the problem at hand, the contrib/intarray module would probably
fill the bill nicely, though I dunno how far I'd trust either it or the
GIST index code back in 7.3.  You'd represent the flags as an array of
the numbers of the flags that are ON, and search it with the @@
operator:
  int[] @@ query_int  - returns TRUE if array satisfies query (like '1&(2|3)')
This on-disk representation might be a bit larger than you were hoping
for, but it'd not break if you suddenly found you needed 33 flags not 32...

            regards, tom lane

Re: Storing and querying boolean fields

От
Greg Smith
Дата:
On Wed, 9 Jan 2008, Richard Brown wrote:

> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

Just make sure you CYA so when said server eats itself the decision maker
can't point the finger at you.  Give them a copy of a dated written
statement from you with text like "PostgreSQL 7.3.4 has several major bugs
that are known to cause data corruption" and quote some of the choicer
items in the trail of release notes at
http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are
in 7.3.7, 7.3.12, 7.3.19.

The bonus to this approach, where you disclaim the whole platform as known
to be unreliable, is that if you ever make a huge mistake you can shrug
off the blame by saying the old database that you told them was buggy, but
weren't allowed to upgrade, ate the data and it was totally not your
fault.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Storing and querying boolean fields

От
Robert Treat
Дата:
On Wednesday 09 January 2008 20:09, Greg Smith wrote:
> On Wed, 9 Jan 2008, Richard Brown wrote:
> > - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> > upgrade if I could, but the decision is not mine.
>
> Just make sure you CYA so when said server eats itself the decision maker
> can't point the finger at you.  Give them a copy of a dated written
> statement from you with text like "PostgreSQL 7.3.4 has several major bugs
> that are known to cause data corruption" and quote some of the choicer
> items in the trail of release notes at
> http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are
> in 7.3.7, 7.3.12, 7.3.19.
>
> The bonus to this approach, where you disclaim the whole platform as known
> to be unreliable, is that if you ever make a huge mistake you can shrug
> off the blame by saying the old database that you told them was buggy, but
> weren't allowed to upgrade, ate the data and it was totally not your
> fault.
>

It would probably also be nice to reference the last security announcement
(http://www.postgresql.org/about/news.905) which includes this relevant
message:

"Minor release 7.3.21 for PostgreSQL version 7.3 will be the last update to
the 7.3 branch. As version 7.3 is now over five years old, the community will
no longer release patches for it after today's release. Users of version 7.3
are encouraged to upgrade to a more current version as soon as possible, or
to seek support from a commercial support vendor who is willing to continue
backpatching for them."

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL