Обсуждение: surprising query optimisation
Hi All, We have an app that deals with a lot of queries, and we've been slowly seeing performance issues emerge. We take a lot of free form queries from users and stumbled upon a very surprising optimisation. So, we have a 'state' column which is a 3 character string column with an index on it. Despite being a string, this column is only used to store one of three values: 'NEW', 'ACK', or 'RSV'. One of our most common queries clauses is "state!='RSV'" and we've found that by substituting this clause with "state='ACK' or state='NEW'" wherever it was used, we've dropped the postgres server's load average from 20 down to 4 and the CPU usage from 60% in user space down to <5%. This seems counter-intuitive to me, so thought I'd ask here. Why would this be likely to make such a difference? We're currently on 9.4, is this something that's likely to be different (better? worse?) if we got all the way up to 10 or 11? cheers, Chris
On 11/28/18 2:26 PM, Chris Withers wrote: > Hi All, > > We have an app that deals with a lot of queries, and we've been slowly > seeing performance issues emerge. We take a lot of free form queries > from users and stumbled upon a very surprising optimisation. > > So, we have a 'state' column which is a 3 character string column with > an index on it. Despite being a string, this column is only used to > store one of three values: 'NEW', 'ACK', or 'RSV'. > > One of our most common queries clauses is "state!='RSV'" and we've found > that by substituting this clause with "state='ACK' or state='NEW'" > wherever it was used, we've dropped the postgres server's load average > from 20 down to 4 and the CPU usage from 60% in user space down to <5%. > > This seems counter-intuitive to me, so thought I'd ask here. Why would The way I see it is state = "something" is a confined question. state != 'something' is potentially unbounded. Does EXPLAIN ANALYZE shed any light? > this be likely to make such a difference? We're currently on 9.4, is > this something that's likely to be different (better? worse?) if we got > all the way up to 10 or 11? > > cheers, > > Chris > > -- Adrian Klaver adrian.klaver@aklaver.com
On 29/11/2018 11:26, Chris Withers wrote: > Hi All, > > We have an app that deals with a lot of queries, and we've been slowly > seeing performance issues emerge. We take a lot of free form queries > from users and stumbled upon a very surprising optimisation. > > So, we have a 'state' column which is a 3 character string column with > an index on it. Despite being a string, this column is only used to > store one of three values: 'NEW', 'ACK', or 'RSV'. > > One of our most common queries clauses is "state!='RSV'" and we've > found that by substituting this clause with "state='ACK' or > state='NEW'" wherever it was used, we've dropped the postgres server's > load average from 20 down to 4 and the CPU usage from 60% in user > space down to <5%. > > This seems counter-intuitive to me, so thought I'd ask here. Why would > this be likely to make such a difference? We're currently on 9.4, is > this something that's likely to be different (better? worse?) if we > got all the way up to 10 or 11? > > cheers, > > Chris > > At a guess... "state!='RSV'" ==> pg only has to check one value and "state='ACK' or state='NEW'" ==> pg has to check two values so I would expect the '!=' to be faster. Cheers, Gavin
Greetings, * Chris Withers (chris@withers.org) wrote: > We have an app that deals with a lot of queries, and we've been slowly > seeing performance issues emerge. We take a lot of free form queries from > users and stumbled upon a very surprising optimisation. > > So, we have a 'state' column which is a 3 character string column with an > index on it. Despite being a string, this column is only used to store one > of three values: 'NEW', 'ACK', or 'RSV'. Sounds like a horrible field to have an index on. > One of our most common queries clauses is "state!='RSV'" and we've found > that by substituting this clause with "state='ACK' or state='NEW'" wherever > it was used, we've dropped the postgres server's load average from 20 down > to 4 and the CPU usage from 60% in user space down to <5%. You've changed the question you're asking the database. PG doesn't *know* that there's only those three values, but it probably has a pretty good guess about how many records are ACK and how many are NEW thanks to those being in the MCV list. > This seems counter-intuitive to me, so thought I'd ask here. Why would this > be likely to make such a difference? We're currently on 9.4, is this > something that's likely to be different (better? worse?) if we got all the > way up to 10 or 11? When you change what you're asking, PG is going to change how it gives you the answer and sometimes that'll be faster and other times it won't be. Really though, if you want something more than wild speculation, posting the 'explain analyze' of each query along with the actual table definitions and sizes and such would be the best way to get it. I'd suggest you check out the wiki article written about this kind of question: https://wiki.postgresql.org/wiki/Slow_Query_Questions Thanks! Stephen
Вложения
On 28/11/2018 22:49, Stephen Frost wrote: > * Chris Withers (chris@withers.org) wrote: >> We have an app that deals with a lot of queries, and we've been slowly >> seeing performance issues emerge. We take a lot of free form queries from >> users and stumbled upon a very surprising optimisation. >> >> So, we have a 'state' column which is a 3 character string column with an >> index on it. Despite being a string, this column is only used to store one >> of three values: 'NEW', 'ACK', or 'RSV'. > > Sounds like a horrible field to have an index on. That's counter-intuitive for me. What leads you to say this and what would you do/recommend instead? > Really though, if you want something more than wild speculation, posting > the 'explain analyze' of each query along with the actual table > definitions and sizes and such would be the best way to get it. table definition: # \d alerts_alert Table "public.alerts_alert" Column | Type | Modifiers -----------------+--------------------------+----------- tags | jsonb | not null id | character varying(86) | not null earliest_seen | timestamp with time zone | not null latest_seen | timestamp with time zone | not null created | timestamp with time zone | not null modified | timestamp with time zone | not null type | character varying(300) | not null state | character varying(3) | not null until | timestamp with time zone | latest_note | text | not null created_by_id | integer | not null modified_by_id | integer | not null owner_id | integer | owning_group_id | integer | not null latest_new | timestamp with time zone | not null Indexes: "alerts_alert_pkey" PRIMARY KEY, btree (id) "alert_tags_index" gin (tags) "alerts_alert_1efacf1d" btree (latest_seen) "alerts_alert_3103a7d8" btree (until) "alerts_alert_599dcce2" btree (type) "alerts_alert_5e7b1936" btree (owner_id) "alerts_alert_9ae73c65" btree (modified) "alerts_alert_9ed39e2e" btree (state) "alerts_alert_b3da0983" btree (modified_by_id) "alerts_alert_c5151f5a" btree (earliest_seen) "alerts_alert_e2fa5388" btree (created) "alerts_alert_e93cb7eb" btree (created_by_id) "alerts_alert_efea2d76" btree (owning_group_id) "alerts_alert_id_13155e16_like" btree (id varchar_pattern_ops) "alerts_alert_latest_new_e8d1fbde_uniq" btree (latest_new) "alerts_alert_state_90ab480b_like" btree (state varchar_pattern_ops) "alerts_alert_type_3021f46f_like" btree (type varchar_pattern_ops) Foreign-key constraints: "alerts_alert_created_by_id_520608c0_fk_alerts_user_id" FOREIGN KEY (created_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_modified_by_id_6db4b04b_fk_alerts_user_id" FOREIGN KEY (modified_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_owner_id_0c00548a_fk_alerts_user_id" FOREIGN KEY (owner_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_owning_group_id_a4869b66_fk_auth_group_id" FOREIGN KEY (owning_group_id) REFERENCES auth_group(id) DEFERRABLE INITIALLY DEFERRED Referenced by: TABLE "alerts_alertevent" CONSTRAINT "alerts_alertevent_alert_id_edd734b8_fk_alerts_alert_id" FOREIGN KEY (alert_id) REFERENCES alerts_alert(id) DEFERRABLE INITIALLY DEFERRED Row counts by state: # select state, count(*) from alerts_alert group by 1 order by 1; state | count -------+--------- ACK | 1053 NEW | 1958 RSV | 1528623 (3 rows) here's an example of the "bad" query plan: https://explain.depesz.com/s/cDkp here's an example with all the "state!='RSV'" clauses rewritten as I described: https://explain.depesz.com/s/B9Xi > I'd suggest you check out the wiki article written about this kind of > question: > > https://wiki.postgresql.org/wiki/Slow_Query_Questions Thanks! Chris
Greetings,
On Fri, Nov 30, 2018 at 07:52 Chris Withers <chris@withers.org> wrote:
On 28/11/2018 22:49, Stephen Frost wrote:
> * Chris Withers (chris@withers.org) wrote:
>> We have an app that deals with a lot of queries, and we've been slowly
>> seeing performance issues emerge. We take a lot of free form queries from
>> users and stumbled upon a very surprising optimisation.
>>
>> So, we have a 'state' column which is a 3 character string column with an
>> index on it. Despite being a string, this column is only used to store one
>> of three values: 'NEW', 'ACK', or 'RSV'.
>
> Sounds like a horrible field to have an index on.
That's counter-intuitive for me. What leads you to say this and what
would you do/recommend instead?
> Really though, if you want something more than wild speculation, posting
> the 'explain analyze' of each query along with the actual table
> definitions and sizes and such would be the best way to get it.
table definition:
# \d alerts_alert
Table "public.alerts_alert"
Column | Type | Modifiers
-----------------+--------------------------+-----------
tags | jsonb | not null
id | character varying(86) | not null
earliest_seen | timestamp with time zone | not null
latest_seen | timestamp with time zone | not null
created | timestamp with time zone | not null
modified | timestamp with time zone | not null
type | character varying(300) | not null
state | character varying(3) | not null
until | timestamp with time zone |
latest_note | text | not null
created_by_id | integer | not null
modified_by_id | integer | not null
owner_id | integer |
owning_group_id | integer | not null
latest_new | timestamp with time zone | not null
Indexes:
"alerts_alert_pkey" PRIMARY KEY, btree (id)
"alert_tags_index" gin (tags)
"alerts_alert_1efacf1d" btree (latest_seen)
"alerts_alert_3103a7d8" btree (until)
"alerts_alert_599dcce2" btree (type)
"alerts_alert_5e7b1936" btree (owner_id)
"alerts_alert_9ae73c65" btree (modified)
"alerts_alert_9ed39e2e" btree (state)
"alerts_alert_b3da0983" btree (modified_by_id)
"alerts_alert_c5151f5a" btree (earliest_seen)
"alerts_alert_e2fa5388" btree (created)
"alerts_alert_e93cb7eb" btree (created_by_id)
"alerts_alert_efea2d76" btree (owning_group_id)
"alerts_alert_id_13155e16_like" btree (id varchar_pattern_ops)
"alerts_alert_latest_new_e8d1fbde_uniq" btree (latest_new)
"alerts_alert_state_90ab480b_like" btree (state varchar_pattern_ops)
"alerts_alert_type_3021f46f_like" btree (type varchar_pattern_ops)
Foreign-key constraints:
"alerts_alert_created_by_id_520608c0_fk_alerts_user_id" FOREIGN KEY
(created_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED
"alerts_alert_modified_by_id_6db4b04b_fk_alerts_user_id" FOREIGN
KEY (modified_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY
DEFERRED
"alerts_alert_owner_id_0c00548a_fk_alerts_user_id" FOREIGN KEY
(owner_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED
"alerts_alert_owning_group_id_a4869b66_fk_auth_group_id" FOREIGN
KEY (owning_group_id) REFERENCES auth_group(id) DEFERRABLE INITIALLY
DEFERRED
Referenced by:
TABLE "alerts_alertevent" CONSTRAINT
"alerts_alertevent_alert_id_edd734b8_fk_alerts_alert_id" FOREIGN KEY
(alert_id) REFERENCES alerts_alert(id) DEFERRABLE INITIALLY DEFERRED
Row counts by state:
# select state, count(*) from alerts_alert group by 1 order by 1;
state | count
-------+---------
ACK | 1053
NEW | 1958
RSV | 1528623
(3 rows)
here's an example of the "bad" query plan:
https://explain.depesz.com/s/cDkp
here's an example with all the "state!='RSV'" clauses rewritten as I
described:
https://explain.depesz.com/s/B9Xi
> I'd suggest you check out the wiki article written about this kind of
> question:
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
Have you tried a partial index on state!=‘RSV’?
Thanks,
Stephen
On 30/11/2018 12:55, Stephen Frost wrote: > > I'd suggest you check out the wiki article written about this kind of > > question: > > > > https://wiki.postgresql.org/wiki/Slow_Query_Questions > > > Have you tried a partial index on state!=‘RSV’? The solution I originally posted, that we do easily enough at our query generation layer, is working perfectly, but this is good to know for next time. My post here is mainly to try and understand what's going on so I can improve my general feel for how to use postgres at it's best. So, why was the query ending up being a big scan rather than some quick lookups based on the index? cheers, Chris
Greetings,
On Fri, Nov 30, 2018 at 08:00 Chris Withers <chris@withers.org> wrote:
On 30/11/2018 12:55, Stephen Frost wrote:
> > I'd suggest you check out the wiki article written about this kind of
> > question:
> >
> > https://wiki.postgresql.org/wiki/Slow_Query_Questions
>
>
> Have you tried a partial index on state!=‘RSV’?
The solution I originally posted, that we do easily enough at our query
generation layer, is working perfectly, but this is good to know for
next time.
My post here is mainly to try and understand what's going on so I can
improve my general feel for how to use postgres at it's best.
So, why was the query ending up being a big scan rather than some quick
lookups based on the index?
Thought that was mentioned already but at least part of the issue is that PG can’t just search for the other values when it’s a != in the index because it wouldn’t know what values to search for... PG doesn’t know, with complete certainty, that there’s only 3 values.
The partial index is something you should want anyway- that index won’t ever be used to search for RSV because it’s cheaper to just scan the table if we are looking for those, and the index will be much, much smaller without that common value being included.
Thanks!
Stephen
Greetings, * Chris Withers (chris@withers.org) wrote: > On 28/11/2018 22:49, Stephen Frost wrote: > >* Chris Withers (chris@withers.org) wrote: > >>We have an app that deals with a lot of queries, and we've been slowly > >>seeing performance issues emerge. We take a lot of free form queries from > >>users and stumbled upon a very surprising optimisation. > >> > >>So, we have a 'state' column which is a 3 character string column with an > >>index on it. Despite being a string, this column is only used to store one > >>of three values: 'NEW', 'ACK', or 'RSV'. > > > >Sounds like a horrible field to have an index on. > > That's counter-intuitive for me. What leads you to say this and what would > you do/recommend instead? For this, specifically, it's because you end up with exactly what you have: a large index with tons of duplicate values. Indexes are particularly good when you have high-cardinality fields. Now, if you have a skewed index, where there's one popular value and a few much less popular ones, then that's where you really want a partial index (as I suggest earlier) so that queries against the non-popular value(s) is able to use the index and the index is much smaller. Of course, for this to work you need to set up the partial index correctly and make sure that your queries end up using it. Thanks! Stephen
Вложения
On 01/12/2018 04:33, Stephen Frost wrote: > Greetings, > > * Chris Withers (chris@withers.org) wrote: >> On 28/11/2018 22:49, Stephen Frost wrote: >>> * Chris Withers (chris@withers.org) wrote: >>>> We have an app that deals with a lot of queries, and we've been slowly >>>> seeing performance issues emerge. We take a lot of free form queries from >>>> users and stumbled upon a very surprising optimisation. >>>> >>>> So, we have a 'state' column which is a 3 character string column with an >>>> index on it. Despite being a string, this column is only used to store one >>>> of three values: 'NEW', 'ACK', or 'RSV'. >>> Sounds like a horrible field to have an index on. >> That's counter-intuitive for me. What leads you to say this and what would >> you do/recommend instead? > For this, specifically, it's because you end up with exactly what you > have: a large index with tons of duplicate values. Indexes are > particularly good when you have high-cardinality fields. Now, if you > have a skewed index, where there's one popular value and a few much less > popular ones, then that's where you really want a partial index (as I > suggest earlier) so that queries against the non-popular value(s) is > able to use the index and the index is much smaller. > > Of course, for this to work you need to set up the partial index > correctly and make sure that your queries end up using it. > > Thanks! > > Stephen An index simply tells pg which block to look at (assuming that the index itself is not sufficient to satisfy the query), so if using an index would still require that pg look at most blocks, then it cheaper to just scan the whole table - rather than load the index and still look at all blocks that contain the table data. I've oversimplified slightly. An index is best used when using it results in fewer blocks being read from disk. Also the use of RAM is better when the size of the index is kept small. For example having an index on sex for nurses is a waste of time because most nurses are female. However, a partial index (as suggested by Stephen, for your query) that includes only males could be useful if you have queries looking for male nurses (assumes male nurses are a very small fraction of nurses such that most data blocks don't have rows for males nurses, and the planner knows this). I once optimised a very complex set queries that made extensive use of indexes. However, with the knowledge I have today, I would have most likely had fewer and smaller indexes. As I now realize, that some of my indexes were probably counter productive, especially as I'd given no thought to how much RAM would be required to host the data plus indexes! Fortunately, while the objective was to run all those queries within 24 hours, they actually only took a couple of hours. Chris, I would strongly suggest, you read up on the excellent documentation pg has about indexes, but don't expect to understand it all at one sitting... Cheers, Gavin
On 30/11/2018 15:33, Stephen Frost wrote: > Greetings, > > * Chris Withers (chris@withers.org) wrote: >> On 28/11/2018 22:49, Stephen Frost wrote: >>> * Chris Withers (chris@withers.org) wrote: >>>> We have an app that deals with a lot of queries, and we've been slowly >>>> seeing performance issues emerge. We take a lot of free form queries from >>>> users and stumbled upon a very surprising optimisation. >>>> >>>> So, we have a 'state' column which is a 3 character string column with an >>>> index on it. Despite being a string, this column is only used to store one >>>> of three values: 'NEW', 'ACK', or 'RSV'. >>> >>> Sounds like a horrible field to have an index on. >> >> That's counter-intuitive for me. What leads you to say this and what would >> you do/recommend instead? > > For this, specifically, it's because you end up with exactly what you > have: a large index with tons of duplicate values. Indexes are > particularly good when you have high-cardinality fields. Now, if you > have a skewed index, where there's one popular value and a few much less > popular ones, then that's where you really want a partial index (as I > suggest earlier) so that queries against the non-popular value(s) is > able to use the index and the index is much smaller. Interesting! In my head, for some reason, I'd always assumed a btree index would break down a char field based on the characters within it. Does that never happen? If I changed this to be an enum field, would != still perform poorly or can the query optimiser spot that it's an enum and just look for the other options? cheers, Chris
On 30/11/2018 22:10, Gavin Flower wrote: > > I once optimised a very complex set queries that made extensive use of > indexes. However, with the knowledge I have today, I would have most > likely had fewer and smaller indexes. As I now realize, that some of my > indexes were probably counter productive, especially as I'd given no > thought to how much RAM would be required to host the data plus > indexes! Fortunately, while the objective was to run all those queries > within 24 hours, they actually only took a couple of hours. So, interestingly, this box has 250GB memory in it, and even though I've set effective_cache_size to 200GB, I only see 9G of memory being used. How can I persuade postgres to keep more in memory? cheers, Chris
Stephen Frost schrieb am 30.11.2018 um 14:05: > PG doesn’t know, with complete certainty, that there’s only 3 > values. Would the optimizer consult a check constraint ensuring that?
Chris Withers schrieb am 05.12.2018 um 12:42: > So, interestingly, this box has 250GB memory in it, and even though > I've set effective_cache_size to 200GB, I only see 9G of memory being > used. How can I persuade postgres to keep more in memory? effective_cache_size is a hint to the optimizer on how much data is to be expected to be cached (by the filesystem). Only shared_buffers (and work_mem) will actually allocate memory from the operating system. Thomas
Greetings, * Thomas Kellerer (spam_eater@gmx.net) wrote: > Stephen Frost schrieb am 30.11.2018 um 14:05: > > PG doesn’t know, with complete certainty, that there’s only 3 > > values. > > Would the optimizer consult a check constraint ensuring that? Not today, I don't believe (haven't looked at the code though, to be fair), and seems like it'd be an awful lot of work for a rare use-case that would be better with just a partial index.. What we will do today is if you ask for a specific value and there's a CHECK constraint which lists out specific values and that value isn't among the set, then we'll just skip over the table (One-time filter: false), thanks to constraint exclusion. Thanks! Stephen
Вложения
Greetings, * Chris Withers (chris@withers.org) wrote: > On 30/11/2018 15:33, Stephen Frost wrote: > >* Chris Withers (chris@withers.org) wrote: > >>On 28/11/2018 22:49, Stephen Frost wrote: > >For this, specifically, it's because you end up with exactly what you > >have: a large index with tons of duplicate values. Indexes are > >particularly good when you have high-cardinality fields. Now, if you > >have a skewed index, where there's one popular value and a few much less > >popular ones, then that's where you really want a partial index (as I > >suggest earlier) so that queries against the non-popular value(s) is > >able to use the index and the index is much smaller. > > Interesting! In my head, for some reason, I'd always assumed a btree index > would break down a char field based on the characters within it. Does that > never happen? Not sure what you mean by 'break down a char field'. > If I changed this to be an enum field, would != still perform poorly or can > the query optimiser spot that it's an enum and just look for the other > options? I don't believe we've got any kind of optimization like that today for enums. Thanks! Stephen
Вложения
On 05/12/2018 14:38, Stephen Frost wrote: > Greetings, > > * Chris Withers (chris@withers.org) wrote: >> On 30/11/2018 15:33, Stephen Frost wrote: >>> * Chris Withers (chris@withers.org) wrote: >>>> On 28/11/2018 22:49, Stephen Frost wrote: >>> For this, specifically, it's because you end up with exactly what you >>> have: a large index with tons of duplicate values. Indexes are >>> particularly good when you have high-cardinality fields. Now, if you >>> have a skewed index, where there's one popular value and a few much less >>> popular ones, then that's where you really want a partial index (as I >>> suggest earlier) so that queries against the non-popular value(s) is >>> able to use the index and the index is much smaller. >> >> Interesting! In my head, for some reason, I'd always assumed a btree index >> would break down a char field based on the characters within it. Does that >> never happen? > > Not sure what you mean by 'break down a char field'. Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more complicated hierarchy ('N', 'NE', 'A', 'AC', etc). >> If I changed this to be an enum field, would != still perform poorly or can >> the query optimiser spot that it's an enum and just look for the other >> options? > > I don't believe we've got any kind of optimization like that today for > enums. Good to know, I see query optimisers as magic, and postgres often seems to achieve magic results ;-) Chris
On 12/05/2018 08:42 AM, Chris Withers wrote: > On 05/12/2018 14:38, Stephen Frost wrote: >> Greetings, >> >> * Chris Withers (chris@withers.org) wrote: >>> On 30/11/2018 15:33, Stephen Frost wrote: >>>> * Chris Withers (chris@withers.org) wrote: >>>>> On 28/11/2018 22:49, Stephen Frost wrote: >>>> For this, specifically, it's because you end up with exactly what you >>>> have: a large index with tons of duplicate values. Indexes are >>>> particularly good when you have high-cardinality fields. Now, if you >>>> have a skewed index, where there's one popular value and a few much less >>>> popular ones, then that's where you really want a partial index (as I >>>> suggest earlier) so that queries against the non-popular value(s) is >>>> able to use the index and the index is much smaller. >>> >>> Interesting! In my head, for some reason, I'd always assumed a btree index >>> would break down a char field based on the characters within it. Does that >>> never happen? >> >> Not sure what you mean by 'break down a char field'. > > Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more > complicated hierarchy ('N', 'NE', 'A', 'AC', etc). The b-tree indexes on legacy RDBMS which I still occasionally fiddle with performs key prefix compression in a manner similar to what you refer to, but otherwise that's not how b-trees work. -- Angular momentum makes the world go 'round.
Greetings, * Ron (ronljohnsonjr@gmail.com) wrote: > On 12/05/2018 08:42 AM, Chris Withers wrote: > >On 05/12/2018 14:38, Stephen Frost wrote: > >>>>* Chris Withers (chris@withers.org) wrote: > >>>Interesting! In my head, for some reason, I'd always assumed a btree index > >>>would break down a char field based on the characters within it. Does that > >>>never happen? > >> > >>Not sure what you mean by 'break down a char field'. > > > >Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more > >complicated hierarchy ('N', 'NE', 'A', 'AC', etc). > > The b-tree indexes on legacy RDBMS which I still occasionally fiddle with > performs key prefix compression in a manner similar to what you refer to, > but otherwise that's not how b-trees work. There's been some discussion of prefix compression in PostgreSQL. Even with that, though, it hardly seems sensible to have an index which has tons of duplicates comprising most of the index, and a != would still have to search the index to make sure there aren't any entries which need to be returned.. Now, maybe once we get skipping scans where we would be able to skip over a large chunk of the index because it's just tons of duplicates without having to visit everything along the way, then maybe having this inefficient index would "just" take up disk space, but why waste that space? Thanks! Stephen