Обсуждение: BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      18477
Logged by:          Alexander777
Email address:      alexander.berezin3000@gmail.com
PostgreSQL version: 14.6
Operating system:   CentOS 7.6.1810
Description:

Summary:
A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if
an ordering column is not nullable.

Description:
Recently, I encountered an intriguing performance issue with a SQL query
over a table with 30+ millions of rows. This query takes significantly
longer to execute than I anticipated. After a thorough investigation, I
extracted a minimal example that clearly illustrates the problem,
particularly when the ordering column is not NULL-able.

Steps to Reproduce:
1. Database Setup:
- PostgreSQL version: 14.6
- Client: psql 14.6
- Operating system: CentOS 7
- Architecture: x86-64


2. Create simple table and index

CREATE TABLE T1 (
    pk BIGSERIAL NOT NULL PRIMARY KEY,
    updated_at BIGINT NOT NULL DEFAULT 0
);

CREATE INDEX idx_t1_updated_at_pk ON T1 (updated_at, pk);

3. Add big number of rows, in my environment there are ~35M rows
SELECT COUNT(*) FROM T1;
  count
----------
 35493666
(1 row)


4. Problematic Query:

SELECT updated_at FROM T1
ORDER BY updated_at NULLS FIRST LIMIT 130000


5. Actual Result:
The query takes tens of seconds to execute, which is significantly slower
than expected.

5.1 EXPLAIN (ANALYZE,BUFFERS) output from problematic query:


               QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1774902.21..1790069.93 rows=130000 width=8) (actual
time=12107.287..12225.239 rows=130000 loops=1)
   Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949
written=231607
   I/O Timings: read=6287.278
   ->  Gather Merge  (cost=1774902.21..5093645.38 rows=28444384 width=8)
(actual time=12107.285..12216.140 rows=130000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
         I/O Timings: read=6287.278
         ->  Sort  (cost=1773902.18..1809457.66 rows=14222192 width=8)
(actual time=12043.738..12048.354 rows=44698 loops=3)
               Sort Key: updated_at NULLS FIRST
               Sort Method: external merge  Disk: 204184kB
               Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
               I/O Timings: read=6287.278
               Worker 0:  Sort Method: external merge  Disk: 210344kB
               Worker 1:  Sort Method: external merge  Disk: 210576kB
               ->  Parallel Index Only Scan using idx_t1_updated_at_pk on t1
 (cost=0.56..494747.42 rows=14222192 width=8) (actual time=0.088..7815.929
rows=11827789 loops=3)
                     Heap Fetches: 858732
                     Buffers: shared hit=893979 read=162956 dirtied=3807
                     I/O Timings: read=6287.278
 Planning Time: 0.099 ms
 Execution Time: 12278.073 ms
(21 rows)


5.2 Same query but w/o NULLS FIRST

SELECT updated_at FROM T1
ORDER BY updated_at LIMIT 130000

5.2.1 EXPLAIN (ANALYZE,BUFFERS) output:


   QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..2643.19 rows=130000 width=8) (actual time=0.063..30.874
rows=130000 loops=1)
   Buffers: shared hit=12 read=501
   I/O Timings: read=9.383
   ->  Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=0.062..21.435
rows=130000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=12 read=501
         I/O Timings: read=9.383
 Planning Time: 0.083 ms
 Execution Time: 35.562 ms
(9 rows)

5.3 Same query but with NULLS LAST

SELECT updated_at FROM T1
ORDER BY updated_at NULLS LAST LIMIT 130000


5.3.1 EXPLAIN (ANALYZE,BUFFERS) output:


   QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..2643.19 rows=130000 width=8) (actual time=1.768..35.431
rows=130000 loops=1)
   Buffers: shared hit=11 read=502
   I/O Timings: read=13.143
   ->  Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=1.767..26.107
rows=130000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=11 read=502
         I/O Timings: read=13.143
 Planning Time: 0.135 ms
 Execution Time: 40.427 ms
(9 rows)


6. Expected Result:
The query in 5. should have performance like in 5.2 or 5.3 for non-nullable
columns.


7. Workaround:
Workaround is to not specify NULLS FIRST explicitly for non-nullable
columns.

8. Severity:
Medium - This issue causes significant performance issues when querying on
big tables (like ERROR: temporary file size exceeds temp_file_limit
(5242880kB)).


PG Bug reporting form <noreply@postgresql.org> writes:
> A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if
> an ordering column is not nullable.

The reason it's performing poorly is that
    ORDER BY updated_at NULLS FIRST
is not compatible with the sort order of your index (which is,
by default, NULLS LAST).  So the query has to be done with an
explicit sort, which requires reading the whole table.

I know you are going to say that it shouldn't matter as long as the
column is marked NOT NULL, but too bad: it does.  This is not a bug,
and it's not something we're likely to expend a great deal of sweat
on improving.  If you know the column is null-free, why are you
writing NULLS FIRST?  If you have a good reason to write NULLS FIRST,
why not declare the index to match?

            regards, tom lane



I fixed it by removing NULLS FIRST from the query (it was introduced by our query builder). However, it seems the query optimizer could account for such scenarios.
Additionally, as you mentioned, the default index is created with NULLS LAST, but in this case, the column is non-nullable, making NULLS LAST unnecessary as well.

Thank you,
Alexander.


чт, 23 мая 2024 г. в 20:26, Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if
> an ordering column is not nullable.

The reason it's performing poorly is that
        ORDER BY updated_at NULLS FIRST
is not compatible with the sort order of your index (which is,
by default, NULLS LAST).  So the query has to be done with an
explicit sort, which requires reading the whole table.

I know you are going to say that it shouldn't matter as long as the
column is marked NOT NULL, but too bad: it does.  This is not a bug,
and it's not something we're likely to expend a great deal of sweat
on improving.  If you know the column is null-free, why are you
writing NULLS FIRST?  If you have a good reason to write NULLS FIRST,
why not declare the index to match?

                        regards, tom lane
On 2024-May-24, Alexander Alexander wrote:

> Additionally, as you mentioned, the default index is created with NULLS
> LAST, but in this case, the column is non-nullable, making NULLS LAST
> unnecessary as well.

But the NOT NULL constraint could be dropped at any minute, so the
system needs to know where NULLs would go if that were to happen.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)



On Fri, 24 May 2024 at 22:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-May-24, Alexander Alexander wrote:
>
> > Additionally, as you mentioned, the default index is created with NULLS
> > LAST, but in this case, the column is non-nullable, making NULLS LAST
> > unnecessary as well.
>
> But the NOT NULL constraint could be dropped at any minute, so the
> system needs to know where NULLs would go if that were to happen.

In my understanding, the planner will hold a lock that will prevent a
concurrent session from doing ALTER TABLE ... DROP NOT NULL, so if the
planner were to do an optimisation such as this, I think it should be
safe. Can you explain where the hazard is?

In the EXECUTE of a prepared statement case, DROP NOT NULL should
cause a relcache invalidation that should be noticed during
AcquireExecutorLocks() which should result in a re-plan.  In the
re-plan, the optimisation will not be enabled.

Isn't the argument you're making here just the same as in [1] which
Tom explained was safe in [2]?

I think this concern may have come from our inability to allow to
allow functional dependency detection of columns that are dependant on
a UNIQUE + NOT NULL constraint. e.g.

CREATE TABLE t (a INT NOT NULL UNIQUE, b INT NOT NULL);
CREATE VIEW v_t AS SELECT a,b FROM t GROUP BY a;

The same works ok if you swap "UNIQUE" for "PRIMARY KEY" as PKs make
the columns non-nullable.

For UNIQUE constraints, we cannot allow the view to be created because
we have no way to add a dependency to block the NOT NULL from being
dropped which would invalidate the view.

(Thanks for your work on getting us closer to allowing that. I hope
you get more time to work on that for v18)

David

[1] https://postgr.es/m/202401231915.uwk6zrqbdvsu@alvherre.pgsql
[2] https://postgr.es/m/4071562.1706038734@sss.pgh.pa.us



David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 24 May 2024 at 22:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> But the NOT NULL constraint could be dropped at any minute, so the
>> system needs to know where NULLs would go if that were to happen.

> In my understanding, the planner will hold a lock that will prevent a
> concurrent session from doing ALTER TABLE ... DROP NOT NULL, so if the
> planner were to do an optimisation such as this, I think it should be
> safe. Can you explain where the hazard is?

This has nothing to do with the planner.  The question Alvaro
responded to is whether the DDL definition of the index establishes
which end nulls should go at.  Which it does, independently of whether
the table actually contains any nulls.

Yes, we could possibly set things up so that an index with nulls last
is considered to match a query that specifies NULLS FIRST if we know
that the column is not-nullable.  But I refuse to believe that this
would be a good use of either development effort or planner cycles.
AFAICS the problem is purely self-inflicted damage: why is the user
specifying NULLS FIRST if he knows the column is not-null?

            regards, tom lane



On Thu, 6 Jun 2024 at 15:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yes, we could possibly set things up so that an index with nulls last
> is considered to match a query that specifies NULLS FIRST if we know
> that the column is not-nullable.  But I refuse to believe that this
> would be a good use of either development effort or planner cycles.
> AFAICS the problem is purely self-inflicted damage: why is the user
> specifying NULLS FIRST if he knows the column is not-null?

I do agree with you when I think about this for the limited scope that
you've described here. However, I think you've constrained your
thinking of the usefulness of such an optimisation far more narrowly
than it could be applied.

While I've not given this much thought, it did occur to me that if you
had two nullable columns which are in the same EquivalanceClass, which
was generated by a strict equality clause.  If these two columns are
not in the same table, what's the reason, assuming the join condition
is strict that we couldn't perform a Merge Join using presorted
results from a NULLS FIRST index on one side of the join and a NULLS
LAST on the other side?

If the user has some genuine reason for creating a NULLS FIRST index
for some other query, then it might be nice if we were able to use
that index for Merge Joins instead of them having to create another
index to speed up the join query.

I've no plans to go and do anything to improve this situation, I just
didn't want this thread to be left in a state that would put off
anyone from making improvements in this area.  Having the ability to
better reuse existing indexes is a worthy goal, at least in my book.

If an EquivalenceClass had some ability to track non-nullness of its
members based on strict OpExprs, then we probably could optimise these
sorts of queries better. I don't know how that would work with
canonical PathKeys, but the EquivalenceClass must at least exist
before a PathKey can, so maybe PathKey.pk_nulls_first can be left
false when the EquivalenceClass states that no nulls can exist after
evaluation of the OpExprs that allowed the EquivalenceClass to exist.
There may be some hazards there I've not thought about, but it seems
like considering those would be part of the work of anyone working on
a patch to improve this area.

David



On Thu, 2024-06-06 at 17:04 +1200, David Rowley wrote:
> On Thu, 6 Jun 2024 at 15:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > AFAICS the problem is purely self-inflicted damage: why is the user
> > specifying NULLS FIRST if he knows the column is not-null?
>
> If the user has some genuine reason for creating a NULLS FIRST index
> for some other query, then it might be nice if we were able to use
> that index for Merge Joins instead of them having to create another
> index to speed up the join query.

I cannot imagine a genuine reason for creating a NULLS FIRST index
on a NOT NULL column, but perhaps I am too narrow-minded.

Yours,
Laurenz Albe



On Thu, 6 Jun 2024 at 20:10, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2024-06-06 at 17:04 +1200, David Rowley wrote:
> > If the user has some genuine reason for creating a NULLS FIRST index
> > for some other query, then it might be nice if we were able to use
> > that index for Merge Joins instead of them having to create another
> > index to speed up the join query.
>
> I cannot imagine a genuine reason for creating a NULLS FIRST index
> on a NOT NULL column, but perhaps I am too narrow-minded.

You didn't quote the part where I said "if you had two nullable
columns". It might be worth rereading what I wrote and taking the time
to understand it.

David