Re: Indexing a Boolean or Null column?
От | Christopher Browne |
---|---|
Тема | Re: Indexing a Boolean or Null column? |
Дата | |
Msg-id | m3brpjjsva.fsf@wolfe.cbbrowne.com обсуждение исходный текст |
Ответ на | Indexing a Boolean or Null column? ("D. Dante Lorenso" <dante@lorenso.com>) |
Список | pgsql-performance |
After a long battle with technology, dante@lorenso.com ("D. Dante Lorenso"), an earthling, wrote: > I've been debating with a collegue who argues that indexing a > boolean column is a BAD idea and that is will actually slow > down queries. No, it would be expected to slow down inserts, but not likely queries. > Will an index on the 'data_is_active' column be used or work > as I expect? I'm assuming that I may have a million entries > sharing the same 'data_lookup_key' and I'll be using that to > search for the active version of the row. > SELECT * > FROM table > WHERE data_lookup_key = 'pear' > AND data_is_active IS TRUE; > > Does it make sense to have an index on data_is_active? Not really. > Now, I've read that in some databases the index on a column that has > relatively even distribution of values over a small set of values > will not be efficient. The problem is (and this is likely to be true for just about any database system that is 'page-based,' which is just about any of them, these days) that what happens, with the elements being so pervasive, throughout the table, queries will be quite likely to hit nearly every page of the table. If you're hitting practically every page, then it is more efficient to just walk thru the pages (Seq Scan) rather than to bother reading the index. The only improvement that could (in theory) be made is to cluster all the "true" values onto one set of pages, and all the "false" ones onto another set of pages, and have a special sort of index that knows which pages are "true" and "false". I _think_ that Oracle's notion of "cluster tables" function rather like this; it is rather debatable whether it would be worthwhile to do similar with PostgreSQL. A way of 'clustering' with PostgreSQL might be to have two tables table_active and table_inactive where a view, representing the 'join' of them, would throw in the 'data_is_active' value. By clever use of some rules/triggers, you could insert into the view, and have values get shuffled into the appropriate table. When doing a select on the view, if you asked for "data_is_active is TRUE", the select would only draw data from table_inactive, or vice-versa. Unfortunately, sometimes the query optimizer may not be clever enough when working with the resulting joins, though that may just be a Simple Matter Of Programming to make it more clever in future versions. :-) -- output = reverse("gro.gultn" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/spreadsheets.html Rules of the Evil Overlord #136. "If I build a bomb, I will simply remember which wire to cut if it has to be deactivated and make every wire red." <http://www.eviloverlord.com/>
В списке pgsql-performance по дате отправления: