explain analyze select distinct 2 from tbl where (fld = 230) limit 1;
Limit (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
-> Unique (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
-> Index Only Scan using idx on tbl (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)
Index Cond: (fld = 230)
Heap Fetches: 0
Planning Time: 0.066 ms
Execution Time: 0.047 ms
With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the index only scan so the unique for distinct only had 1 row and the outer limit only had 1 row. Without the limit, the distinct still does the index only scan but has to do the unique on the million rows and execution time goes to about 100ms.
fld is mostly ordered - it's a serial primary key in another table. The cardinality of the 131 distinct values is an exponential distribution. Of the 20m rows, the fld values ordered by count is 8m, 5m 2m, 1m, 1m, .... down to about 10k. index is btree with stats target of 1000. table is analyzed and vacuum frozen. there is a "create statistics" on this table for n:1 relationship between another field and this one.
Without the distinct, choosing a different value with lower number of rows changed the plan to index only scan with limit somewhere between 3.7% and 4.7% of the table. With a brin index on a similar size/distributed table that is in fld order, that changed to somewhere between 0.6% and 0.7%.