Обсуждение: Planing edge case for sorts with limit on non null column
Hi,
I am not very familiar with mailing lists so forgive me if I am committing some sort of cardinal sin.
I found a weird edge case within this simple query:
```
-- setup table with an non null column and index on it
create table t(i serial primary key);
-- query by the reverse order of the index
explain select * from t order by i desc limit 1;
-- works as expected with the following plan:
-- "Limit (cost=0.15..0.19 rows=1 width=4)"
-- " -> Index Only Scan Backward using t_pkey on t (cost=0.15..82.41 rows=2550 width=4)"
-- same deal query by the reverse order of the index, but also specify the wrong null order
-- from my understanding this should not matter because we don't have any nulls on the table
-- due to the constraint.
explain select * from t order by i desc nulls last limit 1;
-- here is the issue, when I ran the following query I get this plan:
-- "Limit (cost=48.25..48.25 rows=1 width=4)"
-- " -> Sort (cost=48.25..54.63 rows=2550 width=4)"
-- " Sort Key: i DESC NULLS LAST"
-- " -> Seq Scan on t (cost=0.00..35.50 rows=2550 width=4)"
```
It seems that the planner ignores the fact that the column does not contain nulls, and looks for a match between order of the index nulls and the order of the nulls specified in the query, even though the nulls order is irrelevant in this case.
I think that patching `build_index_pathkeys` would lead to the smallest amount of changes, my concern with the patch is the fact that `list_member_ptr` iterates over all of the pathkeys in the planner info.
I am not very familiar with mailing lists so forgive me if I am committing some sort of cardinal sin.
I found a weird edge case within this simple query:
```
-- setup table with an non null column and index on it
create table t(i serial primary key);
-- query by the reverse order of the index
explain select * from t order by i desc limit 1;
-- works as expected with the following plan:
-- "Limit (cost=0.15..0.19 rows=1 width=4)"
-- " -> Index Only Scan Backward using t_pkey on t (cost=0.15..82.41 rows=2550 width=4)"
-- same deal query by the reverse order of the index, but also specify the wrong null order
-- from my understanding this should not matter because we don't have any nulls on the table
-- due to the constraint.
explain select * from t order by i desc nulls last limit 1;
-- here is the issue, when I ran the following query I get this plan:
-- "Limit (cost=48.25..48.25 rows=1 width=4)"
-- " -> Sort (cost=48.25..54.63 rows=2550 width=4)"
-- " Sort Key: i DESC NULLS LAST"
-- " -> Seq Scan on t (cost=0.00..35.50 rows=2550 width=4)"
```
It seems that the planner ignores the fact that the column does not contain nulls, and looks for a match between order of the index nulls and the order of the nulls specified in the query, even though the nulls order is irrelevant in this case.
I think that patching `build_index_pathkeys` would lead to the smallest amount of changes, my concern with the patch is the fact that `list_member_ptr` iterates over all of the pathkeys in the planner info.
I did this weird step creating an alternative pathkey and testing if its relevant because the call sites to that function does some sort of deduplication of useless pathkey and when I tried adding both directions of `nulls_first` to the `retval` the last one was deduplicated (or at least that is what I think that is happening).
I am not very familiar with the Postgres codebase but I hacked a simple patch that from my testing, fixes the issue. But I don't know if it's the correct place to apply that sort of logic, and I haven't written any tests yet. The patch is very much work in progress, it's basically a toy example.
I am not very familiar with the Postgres codebase but I hacked a simple patch that from my testing, fixes the issue. But I don't know if it's the correct place to apply that sort of logic, and I haven't written any tests yet. The patch is very much work in progress, it's basically a toy example.
I would like to contribute if possible but I wanted to hear your opinion before digging further into it.
Thanks,
Mayrom Rabinovich
Вложения
Mayrom Rabinovich <mayromrabinovich@gmail.com> writes:
> -- same deal query by the reverse order of the index, but also specify the
> wrong null order
> -- from my understanding this should not matter because we don't have any
> nulls on the table
> -- due to the constraint.
No, that is not taken into account. The planner's notion of a
concrete sort order always includes a nulls first/last flag, and this
index doesn't match what the query asks for. If you want this query
to use an index you'll need to make an index that puts nulls at the
other end (either ASC NULLS FIRST or DESC NULLS LAST will do).
I'm not really excited about poking holes in the PathKey concept to
make this work the way you want. I think the odds of introducing bugs
would be high. Also, the question could be turned around: if you know
that the table contains no nulls, why are you going out of your way to
specify the "wrong" null order?
regards, tom lane
Thanks for the quick response, On Thu, Feb 5, 2026 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Also, the question could be turned around: if you know > that the table contains no nulls, why are you going out of your way to > specify the "wrong" null order? That query was generated by an ORM, and I didn't want to create a new index on my table just for that query because of the overhead associated with it. So I ended up patching the ORM library I used in order to drop the null ordering if the column is non null. But still, that caught me off guard. I was expecting Postgres to build a better plan for the query. Here is a simple example that shows how I stumbled into that edge case: ``` -- Create a table to query using created_at as a pagination cursor CREATE TABLE d (i INT PRIMARY KEY, created_at TIMESTAMP NOT NULL DEFAULT NOW()); CREATE INDEX a ON d (created_at); -- Get the next 10 records using the "a" index, this select is called repeatedly with decreasing created_at value based on the smallest value returned by the previous query. -- this query is generated by my ORM and the ORM was programmed to always return nulls last when working with pagination, so it builds a query similar to this one: SELECT * FROM d WHERE created_at < $0 ORDER BY created_at DESC NULLS LAST LIMIT 10; ``` > I'm not really excited about poking holes in the PathKey concept to > make this work the way you want. I think the odds of introducing bugs > would be high. Do you have anything in mind that would be acceptable or safe? Unless you feel like the risk outweighs the benefit here, I do think that this edge case could catch other people off guard, especially users that interact with the database using some sort of ORM. Thanks again, Mayrom Rabinovich