Обсуждение: Why does query planner choose slower BitmapAnd ?

Поиск
Список
Период
Сортировка

Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
hi,

I don't understand why the query planner is choosing a BitmapAnd when an
Index Scan followed by a filter is obviously better.

(Note that "new_york_houses" is a view of table "houses" with one
condition on city - and there is an index idx_houses_city. That is the
Index Scan that I think it should use exclusively.)

Here's a fast query that uses the Index Scan followed by a filter:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE roof_area >= 0 AND roof_area < 278.7091;
>                                                                                QUERY PLAN
> ---
>  Aggregate  (cost=167298.10..167298.11 rows=1 width=16) (actual time=141.137..141.137 rows=1 loops=1)
>    ->  Index Scan using idx_houses_city on households  (cost=0.57..167178.87 rows=47694 width=16) (actual
time=0.045..105.953rows=53971 loops=1) 
>          Index Cond: (city = 'New York'::text)
>          Filter: ((roof_area >= 0) AND ((roof_area)::numeric < 278.7091))
>          Rows Removed by Filter: 101719
>  Planning time: 0.688 ms
>  Execution time: 141.250 ms
> (7 rows)

When I add another condition, "phoneable", however, it chooses an
obviously wrong plan:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE roof_area >= 0 AND roof_area < 278.7091 AND
phoneable= true; 
>                                                                                      QUERY PLAN
> ---
>  Aggregate  (cost=128163.05..128163.06 rows=1 width=16) (actual time=4564.677..4564.677 rows=1 loops=1)
>    ->  Bitmap Heap Scan on households  (cost=105894.80..128147.78 rows=6106 width=16) (actual time=4456.690..4561.416
rows=5183loops=1) 
>          Recheck Cond: (city = 'New York'::text)
>          Filter: (phoneable AND (roof_area >= 0) AND ((roof_area)::numeric < 278.7091))
>          Rows Removed by Filter: 40103
>          Heap Blocks: exact=14563
>          ->  BitmapAnd  (cost=105894.80..105894.80 rows=21002 width=0) (actual time=4453.510..4453.510 rows=0
loops=1)
>                ->  Bitmap Index Scan on idx_houses_city  (cost=0.00..1666.90 rows=164044 width=0) (actual
time=16.505..16.505rows=155690 loops=1) 
>                      Index Cond: (city = 'New York'::text)
>                ->  Bitmap Index Scan on idx_houses_phoneable  (cost=0.00..104224.60 rows=10271471 width=0) (actual
time=4384.461..4384.461rows=10647041 loops=1) 
>                      Index Cond: (phoneable = true)
>  Planning time: 0.709 ms
>  Execution time: 4565.067 ms
> (13 rows)

On Postgres 9.4.4 with 244gb memory and SSDs

maintenance_work_mem 1000000
work_mem 500000
random_page_cost 1
seq_page_cost 2

The "houses" table has been analyzed recently and has statistics set to
the max.

Thanks,
Seamus


--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://www.linkedin.com/in/seamusabshere


Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Seamus Abshere <seamus@abshere.net> writes:
> I don't understand why the query planner is choosing a BitmapAnd when an
> Index Scan followed by a filter is obviously better.

> On Postgres 9.4.4 with 244gb memory and SSDs

> maintenance_work_mem 1000000
> work_mem 500000
> random_page_cost 1
> seq_page_cost 2

[ squint... ]  There's no physically explainable situation where
random_page_cost should be less than seq_page_cost.  You may be
hitting a "garbage in, garbage out" situation with those numbers.

Given the large amount of RAM and the SSD underlying storage,
I'd set random_page_cost = seq_page_cost = 1.  You might also
find it advantageous to increase the CPU cost parameters a touch.
I've heard it reported that setting cpu_tuple_cost to something like
0.03 to 0.05 provides a better fit to modern hardware than the
default setting does.  In this particular case, though, it seems
like what you need to do is bump up cpu_index_tuple_cost a little
so as to make the indexscan on idx_houses_phoneable look more expensive.

(BTW, is that index really on just a boolean column?  It seems
unlikely that "phoneable" would be a sufficiently selective
condition to justify having an index on it.  I'd seriously consider
dropping that index as another solution approach.)

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Stephen Frost
Дата:
Tom, all,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Seamus Abshere <seamus@abshere.net> writes:
> > I don't understand why the query planner is choosing a BitmapAnd when an
> > Index Scan followed by a filter is obviously better.
>
> > On Postgres 9.4.4 with 244gb memory and SSDs
>
> > maintenance_work_mem 1000000
> > work_mem 500000
> > random_page_cost 1
> > seq_page_cost 2
>
> [ squint... ]  There's no physically explainable situation where
> random_page_cost should be less than seq_page_cost.  You may be
> hitting a "garbage in, garbage out" situation with those numbers.

Certainly agree with Tom on the above point.

> (BTW, is that index really on just a boolean column?  It seems
> unlikely that "phoneable" would be a sufficiently selective
> condition to justify having an index on it.  I'd seriously consider
> dropping that index as another solution approach.)

Also agreed here, but I've seen field evidence (with reasonable
configurations) that definitely shows that we're a bit too happy to go
with a BitmapAnd scan across two indexes where one returns an order of
magnitude (or less) pages to consider than the other and most of the
time we spend on the overall query is in going through the index to find
a bunch of pages we're just going to throw away when we do the AND.

In one specific case which I can recall offhand (having seen it quite
recently), there was a btree index and a gist index (PostGIS geometry)
where the btree index pulled back perhaps 100k rows but the gist index
returned nearly everything (the bounding box included in the query
covering almost the entire table).  Dropping the gist index greatly
improved *that* query, but, of course, destroyed the performance of more
selective queries bounding box queries which didn't include a constraint
on the column with the btree index (forcing a sequential scan of the
table).

I've not looked into the specific costing here to see why the BitmapAnd
ended up being chosen over just doing an index scan with the btree and
then filtering, but I do believe it to be a problem area that would be
good to try and improve.  The first question is probably- are we
properly accounting for the cost of scanning the index vs the cost of
scanning one index and then applying the filter?

Thanks!

Stephen

Вложения

Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I've not looked into the specific costing here to see why the BitmapAnd
> ended up being chosen over just doing an index scan with the btree and
> then filtering, but I do believe it to be a problem area that would be
> good to try and improve.  The first question is probably- are we
> properly accounting for the cost of scanning the index vs the cost of
> scanning one index and then applying the filter?

We are costing it out in what seems a sane way to me.  In the given
example the "bad" plan is estimated at just slightly cheaper than what
(I assume) the "good" plan is.  I'm inclined to think this represents a
failure to choose good cost parameters for the installation.

Given how remarkably quick the single-index scan is, I also wonder if
that index is fully cached while we had to read some of the other index
from kernel or SSD.  Relative cache states of different indexes is a
problem the planner doesn't currently try to deal with; it's possible
that that could bias it towards trying to AND a large-but-not-fully-cached
index with a smaller-and-fully-cached-one, when not bothering with the
larger index would in fact be better.  You might be able to counter that
to some extent by reducing effective_cache_size, though possibly that
cure is worse than the disease.

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
On Mon, Feb 22, 2016, at 01:48 PM, Tom Lane wrote:
> Given how remarkably quick the single-index scan is, I also wonder if that index is fully cached while we had to read
someof the other index from kernel or SSD. 

This makes sense, except that the speed of the query is the same if I
run it many times in a row. Shouldn't the partially-cached index get
loaded fully by the second query?

On Mon, Feb 22, 2016, at 01:20 PM, Stephen Frost wrote:
> The first question is probably- are we properly accounting for the cost of scanning the index vs the cost of scanning
oneindex and then applying the filter? 

I can affect the query planner's cost estimates with random_page_cost
(only), but I still can't get it to avoid the BitmapAnd - probably
because I am affecting other cost estimates in the same proportion.

No change with original settings OR cpu_tuple_cost=10 OR
seq_page_cost=10 OR (cpu_tuple_cost=0.05, seq_page_cost=1,
random_page_cost=1)

> ->  BitmapAnd  (cost=105894.80..105894.80 rows=21002 width=0) (actual time=4859.397..4859.397 rows=0 loops=1)
>   ->  Bitmap Index Scan on idx_houses_city  (cost=0.00..1666.90 rows=164044 width=0) (actual time=16.098..16.098
rows=155690loops=1) 
>         Index Cond: (city = 'New York'::text)
>   ->  Bitmap Index Scan on idx_houses_phoneable  (cost=0.00..104224.60 rows=10271471 width=0) (actual
time=4771.520..4771.520rows=10647041 loops=1) 
>         Index Cond: (phoneable = true)

However with random_page_cost=10 (hint: cost estimates go up by 4x or
so)

> ->  BitmapAnd  (cost=354510.80..354510.80 rows=21002 width=0) (actual time=4603.575..4603.575 rows=0 loops=1)
>   ->  Bitmap Index Scan on idx_houses_city  (cost=0.00..5590.90 rows=164044 width=0) (actual time=16.529..16.529
rows=155690loops=1) 
>         Index Cond: (city = 'New York'::text)
>   ->  Bitmap Index Scan on idx_houses_phoneable  (cost=0.00..348916.60 rows=10271471 width=0) (actual
time=4530.424..4530.424rows=10647041 loops=1) 
>         Index Cond: (phoneable = true)

I think this is why we originally set random_page_cost so "low"... it
was our way of "forcing" more index usage (we have a big, wide table).

Is there any other way to differentiate the 2 index scans? FWIW, 10% of
houses are phoneable, 0.2% are in the city. (Maybe I'm just supposed to
drop the index like Tom said.)

Best, thanks,
Seamus

--
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


Re: Why does query planner choose slower BitmapAnd ?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I've not looked into the specific costing here to see why the BitmapAnd
> > ended up being chosen over just doing an index scan with the btree and
> > then filtering, but I do believe it to be a problem area that would be
> > good to try and improve.  The first question is probably- are we
> > properly accounting for the cost of scanning the index vs the cost of
> > scanning one index and then applying the filter?
>
> We are costing it out in what seems a sane way to me.  In the given
> example the "bad" plan is estimated at just slightly cheaper than what
> (I assume) the "good" plan is.  I'm inclined to think this represents a
> failure to choose good cost parameters for the installation.

I'll try and get an opportunity to review the numbers for the case which
I ran across previously.

> Given how remarkably quick the single-index scan is, I also wonder if
> that index is fully cached while we had to read some of the other index
> from kernel or SSD.  Relative cache states of different indexes is a
> problem the planner doesn't currently try to deal with; it's possible
> that that could bias it towards trying to AND a large-but-not-fully-cached
> index with a smaller-and-fully-cached-one, when not bothering with the
> larger index would in fact be better.  You might be able to counter that
> to some extent by reducing effective_cache_size, though possibly that
> cure is worse than the disease.

Unfortunately, this doesn't actually hold water for the case which I ran
into as this was across multiple repeated invocations, where both indexes
were fully cached.  It was simply much more expensive to scan the entire
GIST index (which wasn't small) than to fetch and filter the records
returned from the btree index.

I could see possibly adjusting the costing of the bounding box operator
to be lower, to lower the overall cost of the plan to use the btree
index and then filter as compared to the BitmapAnd scan, but the actual
run-times weren't even close, as I recall (something like 45-60s vs.
less than a second), which makes me wonder about there being a missing
costing somewhere.  The number of records returned from the indexes was
similar to the case presented here- 100k records from one, 10m+ from the
other.

I'm just wondering how we manage to not realize that scanning through
gigabytes of index pages is going to be more expensive than running an
operator comparison across 100k records.

Thanks!

Stephen

Вложения

Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Given how remarkably quick the single-index scan is, I also wonder if
>> that index is fully cached while we had to read some of the other index
>> from kernel or SSD.

> Unfortunately, this doesn't actually hold water for the case which I ran
> into as this was across multiple repeated invocations, where both indexes
> were fully cached.  It was simply much more expensive to scan the entire
> GIST index (which wasn't small) than to fetch and filter the records
> returned from the btree index.

Well, I think the main problem in the case you are describing is a bad
estimate of how much of the GIST index needs to be examined, which is
something that needs to be fixed in gistcostestimate or operator-specific
selectivity estimates, not in choose_bitmap_and.  In Seamus' example it
seems that none of the rowcount estimates are unduly far off, so I don't
think he had an estimation failure of the same kind.

> I'm just wondering how we manage to not realize that scanning through
> gigabytes of index pages is going to be more expensive than running an
> operator comparison across 100k records.

IOW, almost certainly we *don't* realize that the query will involve
scanning through gigabytes of index pages.  But btree indexes are much
simpler and easier to make that estimate for than GIST indexes are.

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
On Mon, Feb 22, 2016, at 02:14 PM, Tom Lane wrote:
> IOW, almost certainly we *don't* realize that the query will involve scanning through gigabytes of index pages.  But
btreeindexes are much simpler and easier to make that estimate for... 

Isn't this the crux of my issue, at least?


--
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Seamus Abshere <seamus@abshere.net> writes:
> Is there any other way to differentiate the 2 index scans? FWIW, 10% of
> houses are phoneable, 0.2% are in the city. (Maybe I'm just supposed to
> drop the index like Tom said.)

Hm.  10% is above the threshold where I'd usually think that an indexscan
could beat a seqscan, so dropping the "phoneable" index is definitely
something you should consider, especially if updates on this table are
frequent (because you're paying to update the index too).

However, I'd still counsel fooling with the cpu cost parameters first.
Alternatively, you could leave those at defaults and set random_page_cost
and seq_page_cost to 0.1 to 0.5 or so, which would net out to the same
effect (charging more for CPU relative to I/O) though with different
absolute cost numbers.

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Seamus Abshere <seamus@abshere.net> writes:
> On Mon, Feb 22, 2016, at 02:14 PM, Tom Lane wrote:
>> IOW, almost certainly we *don't* realize that the query will involve scanning through gigabytes of index pages.  But
btreeindexes are much simpler and easier to make that estimate for... 

> Isn't this the crux of my issue, at least?

Not unless you are using a GIST index for booleans, which I sure hope you
ain't.

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Jeff Janes
Дата:
On Mon, Feb 22, 2016 at 8:20 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Also agreed here, but I've seen field evidence (with reasonable
> configurations) that definitely shows that we're a bit too happy to go
> with a BitmapAnd scan across two indexes where one returns an order of
> magnitude (or less) pages to consider than the other and most of the
> time we spend on the overall query is in going through the index to find
> a bunch of pages we're just going to throw away when we do the AND.
>
> In one specific case which I can recall offhand (having seen it quite
> recently), there was a btree index and a gist index (PostGIS geometry)
> where the btree index pulled back perhaps 100k rows but the gist index
> returned nearly everything (the bounding box included in the query
> covering almost the entire table).  Dropping the gist index greatly
> improved *that* query, but, of course, destroyed the performance of more
> selective queries bounding box queries which didn't include a constraint
> on the column with the btree index (forcing a sequential scan of the
> table).
>
> I've not looked into the specific costing here to see why the BitmapAnd
> ended up being chosen over just doing an index scan with the btree and
> then filtering, but I do believe it to be a problem area that would be
> good to try and improve.  The first question is probably- are we
> properly accounting for the cost of scanning the index vs the cost of
> scanning one index and then applying the filter?

I looked into this before as well, and I think it is vastly
underestimating the cost of adding a bit into the bitmap, near this
comment:

        /*
         * Charge a small amount per retrieved tuple to reflect the costs of
         * manipulating the bitmap.  This is mostly to make sure that a bitmap
         * scan doesn't look to be the same cost as an indexscan to retrieve a
         * single tuple.
         */

It charges 0.1 CPU_operator_cost, while reality seemed to be more like
6 CPU_operator_cost.

The assumption seems to be that this estimate doesn't need to be
accurate, because the cost estimate when the bitmap is *used* will
make up for it.  But when ANDing together bitmaps of very unequal
size, that doesn't happen.

I've been wanting to revisit this in the 9.6 code base, but can't find
my code and notes, so this all from memory.

Cheers,

Jeff


Re: Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
On Mon, Feb 22, 2016, at 02:30 PM, Jeff Janes wrote:
> It charges 0.1 CPU_operator_cost, while reality seemed to be more like 6 CPU_operator_cost.

fdy=> select name, setting, boot_val from pg_settings where name ~
'cpu';
         name         | setting | boot_val
----------------------+---------+----------
 cpu_index_tuple_cost | 0.005   | 0.005
 cpu_operator_cost    | 0.0025  | 0.0025
 cpu_tuple_cost       | 0.01    | 0.01
(3 rows)

Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It
"fixed" my problem by preventing the BitmapAnd.

Is this dangerous?


Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> I looked into this before as well, and I think it is vastly
> underestimating the cost of adding a bit into the bitmap, near this
> comment:

>         /*
>          * Charge a small amount per retrieved tuple to reflect the costs of
>          * manipulating the bitmap.  This is mostly to make sure that a bitmap
>          * scan doesn't look to be the same cost as an indexscan to retrieve a
>          * single tuple.
>          */

> It charges 0.1 CPU_operator_cost, while reality seemed to be more like
> 6 CPU_operator_cost.

That sounds *awfully* high.  I don't have any problem with the idea that
that number is off, but I'd want to see some evidence before bumping it
by a factor of 60.

The general assumption here is that most of the per-tuple costs ought to
be reflected in cpu_tuple_cost and cpu_index_tuple_cost.  This addition is
meant to reflect the extra cost of going through a bitmap rather than
just fetching the tuple directly.  That extra cost is certainly not zero,
but it seems to me that it ought to be fairly small relative to the other
processing costs of index and heap fetch.  With the default cpu_xxx_costs,
what you suggest here would mean charging twice as much CPU per-tuple for
a bitmap scan as for a plain index scan, and that doesn't sound right.
(Or if it is right, maybe we have a performance bug in tidbitmap.c.)

[ thinks for a bit... ]  Another thought to look into is that I don't
think the planner worries about the bitmap becoming "lossy", which would
result in many more heap tuple checks than it's predicting.  It might be
that we need to model that effect.  I don't think it's at play in Seamus'
example, given the large work_mem he's using, but maybe it explains your
results?

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Tom Lane
Дата:
Seamus Abshere <seamus@abshere.net> writes:
> Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It
> "fixed" my problem by preventing the BitmapAnd.

> Is this dangerous?

Use a gentle tap, man, don't swing the hammer with quite so much abandon.
I'd have tried doubling the setting to start with.  Raising it 20X might
cause other queries to change behavior undesirably.

            regards, tom lane


Re: Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote:
> Seamus Abshere <seamus@abshere.net> writes:
> > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It "fixed" my problem by preventing the
BitmapAnd.
> > Is this dangerous?
>
> Use a gentle tap, man, don't swing the hammer with quite so much abandon.
> I'd have tried doubling the setting to start with.  Raising it 20X might
> cause other queries to change behavior undesirably.

Doubling it was enough :)


Re: Why does query planner choose slower BitmapAnd ?

От
Seamus Abshere
Дата:
On Mon, Feb 22, 2016, at 02:53 PM, Seamus Abshere wrote:
> On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote:
> > Seamus Abshere <seamus@abshere.net> writes:
> > > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It "fixed" my problem by preventing the
BitmapAnd.
> > > Is this dangerous?
> >
> > Use a gentle tap, man, don't swing the hammer with quite so much abandon.
> > I'd have tried doubling the setting to start with.  Raising it 20X might
> > cause other queries to change behavior undesirably.
>
> Doubling it was enough :)

             name             | setting | boot_val
------------------------------+---------+----------
 cpu_index_tuple_cost         | 0.09    | 0.005   <- 18x boot val, 9x
 cpu_tuple_cost
 cpu_operator_cost            | 0.0025  | 0.0025
 cpu_tuple_cost               | 0.01    | 0.01

In the end I'm back to the big hammer.

I found that larger cities (e.g., more results from the city index)
required a larger cpu_index_tuple_cost to prevent the BitmapAnd.

Now cpu_index_tuple_cost is set to 0.09, which is 18x its boot_val and
9x cpu_tuple_cost... which seems strange.

Logically, should I be changing cpu_operator_cost instead?


Re: Why does query planner choose slower BitmapAnd ?

От
Stephen Frost
Дата:
* Seamus Abshere (seamus@abshere.net) wrote:
> Is there any other way to differentiate the 2 index scans? FWIW, 10% of
> houses are phoneable, 0.2% are in the city. (Maybe I'm just supposed to
> drop the index like Tom said.)

Have to admit that I continue to be interested in this as it might
relate to the somewhat similar (or, at least, seems to be, to me) case
that I ran into before.

What might be interesting would be to see a run with:

explain (analyze, verbose, buffers) select ...

for both of the original queries.  It'd also be nice to know what the
size is of each of the indexes involved.

Last, but perhaps not least, have you considered using a partial index
and only indexing where phoneable is actually true?  That would remove
the need to update the index for the case where phoneable is false and
would make the index smaller.  I don't know that it'd really change
what's going on here, though if it did, that would be interesting too.

Thanks!

Stephen

Вложения

Re: Why does query planner choose slower BitmapAnd ?

От
Alban Hertroys
Дата:
> On 22 Feb 2016, at 16:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> (BTW, is that index really on just a boolean column?  It seems
> unlikely that "phoneable" would be a sufficiently selective
> condition to justify having an index on it.  I'd seriously consider
> dropping that index as another solution approach.)

On that train of thought, I would think that a person or company would only be phoneable if they have a phone number
registeredsomewhere. That somewhere probably being in another table that's too far away from the current table to check
itstraight away - so this is an optimisation, right? 

Where I see that going is as follows: A "contact" either has a phone number - in which case you'd probably rather get
thatphone number - or they don't, in which case a null value is often sufficient[1]. 
While a phone number certainly takes up more storage than a boolean, it wouldn't require an index (because it's
availableright there) nor the extra joins to look up the actual phone number. And if you'd still want to put an index
onit, the null values won't be indexed, which takes a bit off the burden of the larger field size. 

You _could_ also take a shortcut and use a variation of your current approach by storing null instead of false for
phoneable,but then your index would contain nothing but true values which rather defeats the point of having an index. 

Query-wise, I suspect that the number of "contacts" that have a phone number far outweighs the number that doesn't, in
whichcase it's more efficient to query for those that don't have one (fewer index hits) and eliminate those from the
resultsthan the other way around. In my experience, both the NOT EXISTS and the LEFT JOIN + WHERE phoneable IS NULL
tendto perform better. 

A final variation on the above would be to have a conditional index on your PK for those "contacts" that are NOT
phoneable.That's probably the shortest and quickest list to query. I'd still prefer that field to contain something a
bitmore meaningful though... 

Well, enough of my rambling!

Ad 1. It is possible that you cater for the possibility that you don't know whether a "contact" has a phone number or
not,in which case null would probably be the wrong choice for "no phone number" because then you wouldn't be able to
distinguishbetween "no phone number" and "I don't know". 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.