Обсуждение: Could use some advice on search architecture
Hello, I'm contemplating what architecture I should use to make searching as fast as possible given the information available and the search requirements. Let me give some background first; - The database contains products of can potentially have a lot of them (up to about 3 to 5 million) - Each product has about 30 different properties defined about them. Things like what color they are etc. All these properties are enumerated choices, so for instance for color there is a list of available static never changing options of which one can be chosen for that product. This is the same for all those 30 properties. Currently they are stored as enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', 'option2', etc..) - It should be possible to search for products and provide properties that the product SHOULD have, not must have. For instance, for color, the search could specify that it should return products that are either red, blue or green. - The products that match with the most properties should be in the top of the search results - If different products match with the same amount of properties, the ordering should then be on the product that is most popular. There is information in the database (and if need be also in the same table) about how many times a product is sold. - The results will be paginated per 15 products The requirement is that these searches should be as fast as possible, with a maximum of about 200 ms time taken for a search query. What would be the best approach to this if I were to do this in the database only? Should/can this be done with postgresql only or should I look into other types of technology? (Lucene? Sphinx? others?) Any advice on this would be greatly appreciated. Thx in advance! Ron
First, I would not restrict color to 30 colors, if anything like furniture or clothing, etc. is involved. Colors are very important to consumers, and exact colors are important. I would re-think my color selections.
Make sure you have indexes on all the appropriate columns, of course.Susan
On Fri, Apr 18, 2014 at 6:59 AM, Ron Pasch <postgresql@ronpasch.nl> wrote:
Hello,
I'm contemplating what architecture I should use to make searching as fast as possible given the information available and the search requirements. Let me give some background first;
- The database contains products of can potentially have a lot of them (up to about 3 to 5 million)
- Each product has about 30 different properties defined about them. Things like what color they are etc. All these properties are enumerated choices, so for instance for color there is a list of available static never changing options of which one can be chosen for that product. This is the same for all those 30 properties. Currently they are stored as enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', 'option2', etc..)
- It should be possible to search for products and provide properties that the product SHOULD have, not must have. For instance, for color, the search could specify that it should return products that are either red, blue or green.
- The products that match with the most properties should be in the top of the search results
- If different products match with the same amount of properties, the ordering should then be on the product that is most popular. There is information in the database (and if need be also in the same table) about how many times a product is sold.
- The results will be paginated per 15 products
The requirement is that these searches should be as fast as possible, with a maximum of about 200 ms time taken for a search query.
What would be the best approach to this if I were to do this in the database only? Should/can this be done with postgresql only or should I look into other types of technology? (Lucene? Sphinx? others?)
Any advice on this would be greatly appreciated.
Thx in advance!
Ron
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/18/2014 8:59 AM, Ron Pasch wrote: > Hello, > > I'm contemplating what architecture I should use to make searching as > fast as possible given the information available and the search > requirements. Let me give some background first; > > - The database contains products of can potentially have a lot of them > (up to about 3 to 5 million) > - Each product has about 30 different properties defined about them. > Things like what color they are etc. All these properties are enumerated > choices, so for instance for color there is a list of available static > never changing options of which one can be chosen for that product. This > is the same for all those 30 properties. Currently they are stored as > enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', > 'option2', etc..) > - It should be possible to search for products and provide properties > that the product SHOULD have, not must have. For instance, for color, > the search could specify that it should return products that are either > red, blue or green. > - The products that match with the most properties should be in the top > of the search results > - If different products match with the same amount of properties, the > ordering should then be on the product that is most popular. There is > information in the database (and if need be also in the same table) > about how many times a product is sold. > - The results will be paginated per 15 products > > The requirement is that these searches should be as fast as possible, > with a maximum of about 200 ms time taken for a search query. > > What would be the best approach to this if I were to do this in the > database only? Should/can this be done with postgresql only or should I > look into other types of technology? (Lucene? Sphinx? others?) > > Any advice on this would be greatly appreciated. > > Thx in advance! > > Ron > > > As we are PG users, on a PG list, we are gonna recommend PG, obviously. :-) Actually though, I recommend PG. > - The products that match with the most properties should be in the top > of the search results That kinda query is going to be really difficult, I think, regardless of what you use. To find that you'll have to measure every product (all 5 million) and then sort the results. > enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', > 'option2', etc..) So, you have something like: CREATE TYPE colortype AS ENUM ('red', 'green', 'blue'); CREATE TYPE sizetype AS ENUM ('small', 'medium', 'large'); create table product ( id serial, name text, color colortype, size sizetype, ... ); I assume the problem is you don't want to index all 30 properties? That makes sense. > - It should be possible to search for products and provide properties > that the product SHOULD have, not must have. I don't understand this. Say you have a sprocket in red and green. Do you want to search for: select * from product where name = 'sprocket' and (color = 'red' or color = 'green') Or do you want something else? Does the user say they'd "prefer" blue, but will take whatever else you have? Do you search for some properties exactly and some "preferred"? Perhaps you could describe a little more how you want to query the database? Or, maybe, what your user's are searching for? -Andy
On 18/04/2014 21:24, Andy Colson wrote: > On 4/18/2014 8:59 AM, Ron Pasch wrote: >> Hello, >> >> I'm contemplating what architecture I should use to make searching as >> fast as possible given the information available and the search >> requirements. Let me give some background first; >> >> - The database contains products of can potentially have a lot of them >> (up to about 3 to 5 million) >> - Each product has about 30 different properties defined about them. >> Things like what color they are etc. All these properties are enumerated >> choices, so for instance for color there is a list of available static >> never changing options of which one can be chosen for that product. This >> is the same for all those 30 properties. Currently they are stored as >> enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', >> 'option2', etc..) >> - It should be possible to search for products and provide properties >> that the product SHOULD have, not must have. For instance, for color, >> the search could specify that it should return products that are either >> red, blue or green. >> - The products that match with the most properties should be in the top >> of the search results >> - If different products match with the same amount of properties, the >> ordering should then be on the product that is most popular. There is >> information in the database (and if need be also in the same table) >> about how many times a product is sold. >> - The results will be paginated per 15 products >> >> The requirement is that these searches should be as fast as possible, >> with a maximum of about 200 ms time taken for a search query. >> >> What would be the best approach to this if I were to do this in the >> database only? Should/can this be done with postgresql only or should I >> look into other types of technology? (Lucene? Sphinx? others?) >> >> Any advice on this would be greatly appreciated. >> >> Thx in advance! >> >> Ron >> >> >> > > > As we are PG users, on a PG list, we are gonna recommend PG, > obviously. :-) > > Actually though, I recommend PG. > > > > - The products that match with the most properties should be in the top > > of the search results > > > That kinda query is going to be really difficult, I think, regardless > of what you use. To find that you'll have to measure every product > (all 5 million) and then sort the results. > > > > enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', > > 'option2', etc..) > > So, you have something like: > > CREATE TYPE colortype AS ENUM ('red', 'green', 'blue'); > CREATE TYPE sizetype AS ENUM ('small', 'medium', 'large'); > > create table product > ( > id serial, > name text, > color colortype, > size sizetype, > ... > ); > > > I assume the problem is you don't want to index all 30 properties? > That makes sense. > > > - It should be possible to search for products and provide properties > > that the product SHOULD have, not must have. > > I don't understand this. Say you have a sprocket in red and green. > Do you want to search for: > > select * from product where name = 'sprocket' and (color = 'red' or > color = 'green') > > Or do you want something else? Does the user say they'd "prefer" > blue, but will take whatever else you have? > > Do you search for some properties exactly and some "preferred"? > > Perhaps you could describe a little more how you want to query the > database? Or, maybe, what your user's are searching for? > > -Andy > > Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics. Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques So consider some binary data representation Red - 1 (0000 0001) Orange - 2 (0000 0010) Yellow - 4 (0000 0100) Green - 8 (0000 1000) Blue - 16 (0001 0000) Indigo - 32 (0010 0000) Violet - 64 (0100 0000) This way, you can encode several colours in 1 value Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001 Robin
Hi, Seems that this blog post is worth reading in your case http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/ regards, Marc Mamin ________________________________________ Von: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org]" im Auftrag von "Robin [robinstc@live.co.uk] Gesendet: Samstag, 19. April 2014 09:38 An: pgsql-general@postgresql.org Betreff: Re: [GENERAL] Could use some advice on search architecture On 18/04/2014 21:24, Andy Colson wrote: > On 4/18/2014 8:59 AM, Ron Pasch wrote: >> Hello, >> >> I'm contemplating what architecture I should use to make searching as >> fast as possible given the information available and the search >> requirements. Let me give some background first; >> >> - The database contains products of can potentially have a lot of them >> (up to about 3 to 5 million) >> - Each product has about 30 different properties defined about them. >> Things like what color they are etc. All these properties are enumerated >> choices, so for instance for color there is a list of available static >> never changing options of which one can be chosen for that product. This >> is the same for all those 30 properties. Currently they are stored as >> enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', >> 'option2', etc..) >> - It should be possible to search for products and provide properties >> that the product SHOULD have, not must have. For instance, for color, >> the search could specify that it should return products that are either >> red, blue or green. >> - The products that match with the most properties should be in the top >> of the search results >> - If different products match with the same amount of properties, the >> ordering should then be on the product that is most popular. There is >> information in the database (and if need be also in the same table) >> about how many times a product is sold. >> - The results will be paginated per 15 products >> >> The requirement is that these searches should be as fast as possible, >> with a maximum of about 200 ms time taken for a search query. >> >> What would be the best approach to this if I were to do this in the >> database only? Should/can this be done with postgresql only or should I >> look into other types of technology? (Lucene? Sphinx? others?) >> >> Any advice on this would be greatly appreciated. >> >> Thx in advance! >> >> Ron >> >> >> > > > As we are PG users, on a PG list, we are gonna recommend PG, > obviously. :-) > > Actually though, I recommend PG. > > > > - The products that match with the most properties should be in the top > > of the search results > > > That kinda query is going to be really difficult, I think, regardless > of what you use. To find that you'll have to measure every product > (all 5 million) and then sort the results. > > > > enumerated types (CREATE TYPE propertyvalue AS ENUM ('option1', > > 'option2', etc..) > > So, you have something like: > > CREATE TYPE colortype AS ENUM ('red', 'green', 'blue'); > CREATE TYPE sizetype AS ENUM ('small', 'medium', 'large'); > > create table product > ( > id serial, > name text, > color colortype, > size sizetype, > ... > ); > > > I assume the problem is you don't want to index all 30 properties? > That makes sense. > > > - It should be possible to search for products and provide properties > > that the product SHOULD have, not must have. > > I don't understand this. Say you have a sprocket in red and green. > Do you want to search for: > > select * from product where name = 'sprocket' and (color = 'red' or > color = 'green') > > Or do you want something else? Does the user say they'd "prefer" > blue, but will take whatever else you have? > > Do you search for some properties exactly and some "preferred"? > > Perhaps you could describe a little more how you want to query the > database? Or, maybe, what your user's are searching for? > > -Andy > > Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics. Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques So consider some binary data representation Red - 1 (0000 0001) Orange - 2 (0000 0010) Yellow - 4 (0000 0100) Green - 8 (0000 1000) Blue - 16 (0001 0000) Indigo - 32 (0010 0000) Violet - 64 (0100 0000) This way, you can encode several colours in 1 value Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001 Robin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
bottom post
On 19/04/2014 12:46, R. Pasch wrote:
On 19/04/2014 12:46, R. Pasch wrote:
On 19-4-2014 9:38, Robin wrote:I stopped reading when I heard the word "sold by SAP" ;-) This project is solely build with open-source and freely available software.
Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics.
Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques
So consider some binary data representation
Red - 1 (0000 0001)
Orange - 2 (0000 0010)
Yellow - 4 (0000 0100)
Green - 8 (0000 1000)
Blue - 16 (0001 0000)
Indigo - 32 (0010 0000)
Violet - 64 (0100 0000)
This way, you can encode several colours in 1 value
Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001
Robin
I've been thinking about using a binary data representation but didn't come to a solution to this specific problem quite yet. Per property of a product, only one bit would be 1 and the rest would be 0. What would a query look like to match all products that have a bit in the correct position?
Say for instance these are a couple records (and yes, property values can be null as well)
title, property1, property2, property3
================================
product1, 0000 0001, 0000 0010, NULL
product2, 0000 0100, 0100 0000, 0010 0000
product3, 0010 0000, 0010 0000, 0100 0000
Say that I would like to retrieve the products that either have property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be 0010 1001 and would have to match product1 and product3 as they both have their individual bit matching one of the bits being asked for. What would a where statement look like using this type of binary representation?
If that would be fairly simple to do and fast (most important factor) then I could do an OR construction on all property columns and have something count the amount of properties that actually matched. Is that something you can do with a binary operator of some sort as well? Count the amount of overlapping bits?
Say for instance I have a binary value of 0110 0101 and another binary value of 1100 0100, how could I found out how many bits matched? (in this case the number of matching bits would be 2)
I understand the reluctance to pay SAP-style rates, as a longtime DB user, I have learned some 'charm' techniques.
However, I poked around a bit for alternatives, as I do like the column-oriented approach, and found something called - MonetDB - it apparently has a column-store db kernel, and is open source - I suggest you have a look, if it does what it says on the label, then it looks like a find.
There is a discussion of bitmask-trickiness here also dealing with colours
Robin
Postgresql has 2 column store, 1-in memory(cant remember the name) and http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analytics
On Sat, Apr 19, 2014 at 2:10 PM, Robin <robinstc@live.co.uk> wrote:
bottom post
On 19/04/2014 12:46, R. Pasch wrote:On 19-4-2014 9:38, Robin wrote:I stopped reading when I heard the word "sold by SAP" ;-) This project is solely build with open-source and freely available software.
Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics.
Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques
So consider some binary data representation
Red - 1 (0000 0001)
Orange - 2 (0000 0010)
Yellow - 4 (0000 0100)
Green - 8 (0000 1000)
Blue - 16 (0001 0000)
Indigo - 32 (0010 0000)
Violet - 64 (0100 0000)
This way, you can encode several colours in 1 value
Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001
Robin
I've been thinking about using a binary data representation but didn't come to a solution to this specific problem quite yet. Per property of a product, only one bit would be 1 and the rest would be 0. What would a query look like to match all products that have a bit in the correct position?
Say for instance these are a couple records (and yes, property values can be null as well)
title, property1, property2, property3
================================
product1, 0000 0001, 0000 0010, NULL
product2, 0000 0100, 0100 0000, 0010 0000
product3, 0010 0000, 0010 0000, 0100 0000
Say that I would like to retrieve the products that either have property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be 0010 1001 and would have to match product1 and product3 as they both have their individual bit matching one of the bits being asked for. What would a where statement look like using this type of binary representation?
If that would be fairly simple to do and fast (most important factor) then I could do an OR construction on all property columns and have something count the amount of properties that actually matched. Is that something you can do with a binary operator of some sort as well? Count the amount of overlapping bits?
Say for instance I have a binary value of 0110 0101 and another binary value of 1100 0100, how could I found out how many bits matched? (in this case the number of matching bits would be 2)
I understand the reluctance to pay SAP-style rates, as a longtime DB user, I have learned some 'charm' techniques.
However, I poked around a bit for alternatives, as I do like the column-oriented approach, and found something called - MonetDB - it apparently has a column-store db kernel, and is open source - I suggest you have a look, if it does what it says on the label, then it looks like a find.
There is a discussion of bitmask-trickiness here also dealing with colours
Robin
On 04/19/2014 06:26 AM, Ron Pasch wrote: > > - It should be possible to search for products and provide properties > > that the product SHOULD have, not must have. >> >> I don't understand this. Say you have a sprocket in red and green. Do you want to search for: >> >> select * from product where name = 'sprocket' and (color = 'red' or color = 'green') >> >> Or do you want something else? Does the user say they'd "prefer" blue, but will take whatever else you have? >> >> Do you search for some properties exactly and some "preferred"? >> >> Perhaps you could describe a little more how you want to query the database? Or, maybe, what your user's are searchingfor? >> >> -Andy >> >> > > Yes, the user can prefer certain properties and the products that match most of the properties should be in the top ofthe results, but if a product doesn't match all of them but just some of them, they should still be returned, but lowerin the results. > > I'm seriously wondering if doing this solely with postgres is even possible without having long execution times. I've donesome tests with 5 million records and just doing the "or" construction you mentioned above, which resulted in 600 to900 ms queries and returning only those records of which all properties match at least one selected value. > > I was thinking that perhaps using a search engine like lucene or sphinx would be more appropriate, but then I wonder whatI would exactly be indexing and how I would be querying that, but that's a question for a different mailing list ;-) Please keep the list cc'd, so others can help as well. Yeah, doing a bunch of or's is gonna have to test all 5 million products. I wonder if there is a way we can treat this like a two step process. 1) cut down the number of products 2) sort them by #matches, popularity, etc You've talked about #2, but how about #1. Is there any way to either include or exclude a product? Users don't just askfor red, they ask for "tires (maybe red)". Not all 5 million products are tires, right? -Andy
>On 04/19/2014 06:26 AM, Ron Pasch wrote: >> > - It should be possible to search for products and provide properties >> > that the product SHOULD have, not must have. >>> >>> I don't understand this. Say you have a sprocket in red and green. Do you want to search for: >>> >>> select * from product where name = 'sprocket' and (color = 'red' or color = 'green') >>> >>> Or do you want something else? Does the user say they'd "prefer" blue, but will take whatever else you have? >>> >>> Do you search for some properties exactly and some "preferred"? >>> >>> Perhaps you could describe a little more how you want to query the database? Or, maybe, what your user's are searchingfor? >>> >>> -Andy >>> >> >> Yes, the user can prefer certain properties and the products that match most of the properties should be in the top ofthe results, but if a product doesn't match all of them but just some of them, they should still be returned, but lowerin the results. >> >> I'm seriously wondering if doing this solely with postgres is even possible without having long execution times. I'vedone some tests with 5 million records and just doing the "or" construction you mentioned above, which resulted in 600to 900 ms queries and returning only those records of which all properties match at least one selected value. I don't think that OR clauses are the right way as the aim is to count how many attributes do match the search. basically a standard approach would look like: SELECT pID, sum(match) as matches FROM ( selct pID, 1 as match from Products where color ='pink' UNION ALL selct pID, 1 as match from Products where size ='XXL' ... )foo GROUP BY pID order by matches DESC How many distinct attributes are involved ? ( 15 colors + 9 sizes + ....) Marc >> >> I was thinking that perhaps using a search engine like lucene or sphinx would be more appropriate, but then I wonder whatI would exactly be indexing and how I would be querying that, but that's a question for a different mailing list ;-) > >Please keep the list cc'd, so others can help as well. > >Yeah, doing a bunch of or's is gonna have to test all 5 million products. > >I wonder if there is a way we can treat this like a two step process. > >1) cut down the number of products > >2) sort them by #matches, popularity, etc > >You've talked about #2, but how about #1. Is there any way to either include or exclude a product? Users don't just askfor red, they ask for "tires (maybe red)". Not all 5 million products are tires, right? > >-Andy
Because this is a topic that interests me I have done some digging.
MonetDB is a proper column-store DBMS. It is used on some decent sized projects, several of which are EU-funded.
Digging a bit deeper I have discovered that a PostgreSQL Foreign Data Wrapper for MonetDB has been created - monetdb_fdw.
The fdw is written up here
There is a video that shows what happens when a query is executed in PostgreSQL and MonetDB
The video shows an analytical query being processed
I think its worth investigating further
MonetDB is a proper column-store DBMS. It is used on some decent sized projects, several of which are EU-funded.
Digging a bit deeper I have discovered that a PostgreSQL Foreign Data Wrapper for MonetDB has been created - monetdb_fdw.
The fdw is written up here
There is a video that shows what happens when a query is executed in PostgreSQL and MonetDB
The video shows an analytical query being processed
- Using PostgreSQL - 177 seconds
- Using MonetDB - 8 seconds
- Using a remote MonetDB server through MonetDB FDW - 1 second
I think its worth investigating further
Robin St.Clair
the other is named imcs.
imcs is mainly a memory database,it is very fast for olap,because of multi threads parallel query plan and column storage.for billions of data,most group by like querys return within 2s.
we use it for several weeks and happy for the performance.
jov
在 2014-4-19 下午8:27,"Dorian Hoxha" <dorian.hoxha@gmail.com>写道:
Postgresql has 2 column store, 1-in memory(cant remember the name) and http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analyticsOn Sat, Apr 19, 2014 at 2:10 PM, Robin <robinstc@live.co.uk> wrote:bottom post
On 19/04/2014 12:46, R. Pasch wrote:On 19-4-2014 9:38, Robin wrote:I stopped reading when I heard the word "sold by SAP" ;-) This project is solely build with open-source and freely available software.
Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics.
Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques
So consider some binary data representation
Red - 1 (0000 0001)
Orange - 2 (0000 0010)
Yellow - 4 (0000 0100)
Green - 8 (0000 1000)
Blue - 16 (0001 0000)
Indigo - 32 (0010 0000)
Violet - 64 (0100 0000)
This way, you can encode several colours in 1 value
Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001
Robin
I've been thinking about using a binary data representation but didn't come to a solution to this specific problem quite yet. Per property of a product, only one bit would be 1 and the rest would be 0. What would a query look like to match all products that have a bit in the correct position?
Say for instance these are a couple records (and yes, property values can be null as well)
title, property1, property2, property3
================================
product1, 0000 0001, 0000 0010, NULL
product2, 0000 0100, 0100 0000, 0010 0000
product3, 0010 0000, 0010 0000, 0100 0000
Say that I would like to retrieve the products that either have property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be 0010 1001 and would have to match product1 and product3 as they both have their individual bit matching one of the bits being asked for. What would a where statement look like using this type of binary representation?
If that would be fairly simple to do and fast (most important factor) then I could do an OR construction on all property columns and have something count the amount of properties that actually matched. Is that something you can do with a binary operator of some sort as well? Count the amount of overlapping bits?
Say for instance I have a binary value of 0110 0101 and another binary value of 1100 0100, how could I found out how many bits matched? (in this case the number of matching bits would be 2)
I understand the reluctance to pay SAP-style rates, as a longtime DB user, I have learned some 'charm' techniques.
However, I poked around a bit for alternatives, as I do like the column-oriented approach, and found something called - MonetDB - it apparently has a column-store db kernel, and is open source - I suggest you have a look, if it does what it says on the label, then it looks like a find.
There is a discussion of bitmask-trickiness here also dealing with colours
Robin
I ended up running some tests using 5 million rows of products. I used about 5 properties that a product should always be matched to, and then I used the following in the select; (CASE property1 in (option1, option2, option3, etc) WHEN TRUE THEN 1 ELSE 0 END) + (CASE property2 in (option1, option2, option3, etc) WHEN TRUE THEN 1 ELSE 0 END) + (CASE property3 in (option1, option2, option3, etc) WHEN TRUE THEN 1 ELSE 0 END) ... AS numberOfMatchingProperties That way I can use the number of matching properties in the order by clause and have the properties that must always match filter out the bulk of the 5 million records. The tests that I've done return around 100.000 records in about 100 to 150 milliseconds using this technique, and using OFFSET and LIMIT to paginate those by about 15 records each time is very very fast. This I can live with :) Thx for letting me pick your brains on this a little. Cheers, Ron