Обсуждение: BUG #11500: PRIMARY KEY index not being used

Поиск
Список
Период
Сортировка

BUG #11500: PRIMARY KEY index not being used

От
marko@joh.to
Дата:
The following bug has been logged on the website:

Bug reference:      11500
Logged by:          Marko Tiikkaja
Email address:      marko@joh.to
PostgreSQL version: 9.1.12
Operating system:   Linux
Description:

Hi,

We've been observing a performance problem where a PRIMARY KEY index is not
being used.  The problem looks like this:

pg2=#* explain analyze select * from events where eventid = 132685185 and
processed = 0;
                                                            QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_events_processed on events  (cost=0.00..7.73 rows=1
width=106) (actual time=31.808..31.808 rows=0 loops=1)
   Index Cond: (processed = 0)
   Filter: (eventid = 132685185)
 Total runtime: 31.852 ms
(4 rows)

pg2=#* explain analyze select * from events where eventid = 132685185 and
processed+0 = 0;
                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
 Index Scan using events_pkey on events  (cost=0.00..12.38 rows=1 width=106)
(actual time=0.071..0.071 rows=0 loops=1)
   Index Cond: (eventid = 132685185)
   Filter: ((processed + 0) = 0)
 Total runtime: 0.109 ms
(4 rows)


I'm guessing that's happening because the index on processed is smaller
(7GB, relpages=900880 vs 3.7GB, relpages=478225).  The statistics say that
there are no rows where processed=0 (and it's not far from the truth), but
it's still a risky plan compared to the PK lookup.  The index
index_events_processed is an index on events(processed), which should
probably be a partial index on  WHERE processed = 0, but I thought I'd
report this plan anyway.

Any thoughts?

Re: BUG #11500: PRIMARY KEY index not being used

От
Marti Raudsepp
Дата:
On Fri, Sep 26, 2014 at 11:02 AM,  <marko@joh.to> wrote:
> The statistics say that
> there are no rows where processed=0 (and it's not far from the truth), but
> it's still a risky plan compared to the PK lookup.

> Any thoughts?

PostgreSQL 9.0 introduced this optimization for greater/less operators:

> When looking up statistics for greater/less-than comparisons, if the
> comparison value is in the first or last histogram bucket, use an index
> (if available) to fetch the current actual column minimum or maximum.
> This greatly improves the accuracy of estimates for comparison values
> near the ends of the data range, particularly if the range is constantly
> changing due to addition of new data.

Not sure whether it's a good idea a bad idea, but perhaps a solution
is to expand this to equality lookups too?

Does using "WHERE processed <= 0" work around the problem? (Assuming
you don't have any negative numbers in this column).

> The index
> index_events_processed is an index on events(processed), which should
> probably be a partial index on  WHERE processed = 0, but I thought I'd
> report this plan anyway.

I guess you would still have this problem, unless your new index
contains the eventid column.

Regards,
Marti

Re: BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On 9/26/14 2:02 PM, Marti Raudsepp wrote:
> On Fri, Sep 26, 2014 at 11:02 AM,  <marko@joh.to> wrote:
>> The statistics say that
>> there are no rows where processed=0 (and it's not far from the truth), but
>> it's still a risky plan compared to the PK lookup.
>
>> Any thoughts?
>
> PostgreSQL 9.0 introduced this optimization for greater/less operators:
>
>> When looking up statistics for greater/less-than comparisons, if the
>> comparison value is in the first or last histogram bucket, use an index
>> (if available) to fetch the current actual column minimum or maximum.
>> This greatly improves the accuracy of estimates for comparison values
>> near the ends of the data range, particularly if the range is constantly
>> changing due to addition of new data.
>
> Not sure whether it's a good idea a bad idea, but perhaps a solution
> is to expand this to equality lookups too?

I'm not sure that's the right idea to be honest.  The problem is that
the planner is taking a risk by using an index which could contain
(theoretically) any number of matching rows, instead of using the
primary key which is guaranteed to only contain 0 or 1 rows.  Sure,
peeking into the index to see that there are indeed some processed=0
rows would probably discourage the planner from using it, but why bother?

> Does using "WHERE processed <= 0" work around the problem? (Assuming
> you don't have any negative numbers in this column).

I unfortunately already dropped the problematic index, so I can't answer
that.


.marko

Re: BUG #11500: PRIMARY KEY index not being used

От
Tom Lane
Дата:
marko@joh.to writes:
> PostgreSQL version: 9.1.12

> We've been observing a performance problem where a PRIMARY KEY index is not
> being used.  The problem looks like this:

In 9.2 and again in 9.3, we significantly changed the planner's modeling
of index access costs, with a view towards making better choices when
there are multiple plausible indexes to use.  If you can reproduce this
misbehavior in 9.3 or later it would be worth looking into; but we're not
going to change 9.1's estimation rules at this late date.  People running
on back branches tend to want plan stability in my experience.

            regards, tom lane

Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
Hi,

Resurrecting since this has been causing us issues again, though this
time on a different index.  Given a schema similar to this:

CREATE TYPE order_state AS ENUM ('INITIAL', 'WAIT_EVENT', 'DONE');
CREATE TABLE orders(
    order_id bigserial PRIMARY KEY,
    state order_state NOT NULL DEFAULT 'INITIAL'
);
-- there are other indexed columns here, but nothing really reads
through the entire index
CREATE INDEX orders_wait_event_idx ON orders ((1)) WHERE state = 'WAIT_EVENT';

where 80% of rows go through WAIT_EVENT before DONE.  Then we have a
frequent query like this:

SELECT ..
FROM orders
WHERE
    order_id = $1 AND
    state = 'WAIT_EVENT';

which almost always uses the primary key.  But sometimes, perhaps
after an autovacuum or something, something changes and postgres
decides to start serving that query through the orders_wait_event_idx
index.  Now those queries need to first suffer through this:

 Index Scan using orders_wait_event_idx on orders
(cost=0.54..94812.85 rows=85043 width=1223) (actual
time=0.166..7199.020 rows=84535 loops=1)
   Buffers: shared hit=15676 read=91962 dirtied=1988
 Planning:
   Buffers: shared hit=807 read=11 dirtied=1
 Planning Time: 4.634 ms
 Execution Time: 7204.117 ms

and subsequent executions also take ~100ms, instead of the ~0.05ms
through the PRIMARY KEY.

Just testing locally, if I add some data:

INSERT INTO orders (state) SELECT 'DONE' FROM generate_series(1, 65536 * 12);
ANALYZE orders;

I get:

EXPLAIN (ANALYZE, BUFFERS FALSE) SELECT 1 FROM orders WHERE order_id =
1 AND state = 'WAIT_EVENT';
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders_wait_event_idx on orders  (cost=0.12..8.14
rows=1 width=4) (actual time=0.003..0.003 rows=0.00 loops=1)
   Filter: (order_id = 1)
   Index Searches: 1
 Planning Time: 0.278 ms
 Execution Time: 0.035 ms
(5 rows)

which just seems like an insane gamble to take compared to reading the
primary key index.  If you're right, you save fractions of a
millisecond, but if you're wrong, it could be the ten seconds like
we've been seeing in production.

We've been seeing this on 16.9 in prod, and with the code here I see
the planner hitting the casino on latest master as well.

Let me know what you think.


.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Laurenz Albe
Дата:
On Fri, 2025-10-03 at 09:23 +0300, Marko Tiikkaja wrote:
> CREATE TYPE order_state AS ENUM ('INITIAL', 'WAIT_EVENT', 'DONE');
> CREATE TABLE orders(
>     order_id bigserial PRIMARY KEY,
>     state order_state NOT NULL DEFAULT 'INITIAL'
> );
> -- there are other indexed columns here, but nothing really reads
> through the entire index
> CREATE INDEX orders_wait_event_idx ON orders ((1)) WHERE state = 'WAIT_EVENT';
>
> where 80% of rows go through WAIT_EVENT before DONE.  Then we have a
> frequent query like this:
>
> SELECT ..
> FROM orders
> WHERE
>     order_id = $1 AND
>     state = 'WAIT_EVENT';
>
> which almost always uses the primary key.  But sometimes, perhaps
> after an autovacuum or something, something changes and postgres
> decides to start serving that query through the orders_wait_event_idx
> index.  Now those queries need to first suffer through this:
>
>  Index Scan using orders_wait_event_idx on orders (cost=0.54..94812.85 rows=85043 width=1223) (actual
time=0.166..7199.020rows=84535 loops=1) 
>    Buffers: shared hit=15676 read=91962 dirtied=1988
>  Planning:
>    Buffers: shared hit=807 read=11 dirtied=1
>  Planning Time: 4.634 ms
>  Execution Time: 7204.117 ms
>
> which just seems like an insane gamble to take compared to reading the
> primary key index.  If you're right, you save fractions of a
> millisecond, but if you're wrong, it could be the ten seconds like
> we've been seeing in production.
>
> We've been seeing this on 16.9 in prod, and with the code here I see
> the planner hitting the casino on latest master as well.
>
> Let me know what you think.

Did you check if the optimizer statistics are up to date?

The index you created is only useful if only a very small percentage of
the rows in the table match the WHERE condition.  It may be that the
optimizer chooses the index by mistake: deduplication of identical index
keys will render the index rather small, and PostgreSQL prefers the smaller
index if it thinks that both indexes will do the task equally well.

But I think that the root of the problem is that you created an index that
you don't want to get used.  I understand that you want it for some other
statement, but is there really nothing better to index than the constant 1?

Yours,
Laurenz Albe



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On Fri, Oct 3, 2025 at 9:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> The index you created is only useful if only a very small percentage of
> the rows in the table match the WHERE condition.  It may be that the
> optimizer chooses the index by mistake: deduplication of identical index
> keys will render the index rather small, and PostgreSQL prefers the smaller
> index if it thinks that both indexes will do the task equally well.

I didn't want to include too much in the small repro that I had, but
as I alluded to in the comment, the index has real data in it.
Deduplication really plays no real part here.  But yes, the index is
still physically smaller because it indexes a small subset of the
table.

> Did you check if the optimizer statistics are up to date?

What I'm trying to say is that I don't think there is any data you
could put in the stats tables to justify gambling on this index.  But
feel free to try out my example yourself.  This is a bit more like
what the production data looks like:

INSERT INTO orders (state) SELECT CASE WHEN random() <= 0.8 THEN
order_state 'WAIT_EVENT' ELSE order_state 'DONE' END FROM
generate_series(1, 65536 * 12);
UPDATE orders SET state = 'DONE' WHERE state = 'WAIT_EVENT';
ANALYZE orders;

=# EXPLAIN ANALYZE SELECT 1 FROM orders WHERE order_id = 1 AND state =
'WAIT_EVENT';
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders_wait_event_idx on orders  (cost=0.38..8.39
rows=1 width=4) (actual time=54.650..54.651 rows=0.00 loops=1)
   Filter: (order_id = 1)
   Index Searches: 1
   Buffers: shared hit=5239
 Planning:
   Buffers: shared hit=30
 Planning Time: 1.221 ms
 Execution Time: 54.682 ms
(8 rows)

=# EXPLAIN ANALYZE SELECT 1 FROM orders WHERE order_id = 1 AND state =
'WAIT_EVENT';
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders_wait_event_idx on orders  (cost=0.38..8.39
rows=1 width=4) (actual time=0.459..0.459 rows=0.00 loops=1)
   Filter: (order_id = 1)
   Index Searches: 1
   Buffers: shared hit=495
 Planning Time: 0.091 ms
 Execution Time: 0.476 ms
(6 rows)

=# EXPLAIN ANALYZE SELECT 1 FROM orders WHERE order_id = 1 AND
state::text = 'WAIT_EVENT';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using orders_pkey on orders  (cost=0.42..8.45 rows=1
width=4) (actual time=0.039..0.039 rows=0.00 loops=1)
   Index Cond: (order_id = 1)
   Filter: ((state)::text = 'WAIT_EVENT'::text)
   Rows Removed by Filter: 1
   Index Searches: 1
   Buffers: shared hit=10
 Planning:
   Buffers: shared hit=99
 Planning Time: 2.864 ms
 Execution Time: 0.077 ms
(10 rows)


.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Tom Lane
Дата:
Marko Tiikkaja <marko@joh.to> writes:
> ... we have a frequent query like this:

> SELECT ..
> FROM orders
> WHERE
>     order_id = $1 AND
>     state = 'WAIT_EVENT';

> which almost always uses the primary key.  But sometimes, perhaps
> after an autovacuum or something, something changes and postgres
> decides to start serving that query through the orders_wait_event_idx
> index.

You haven't given us a lot to go on: no reproducible test case,
no clear description of what triggers the issue, not even the complete
EXPLAIN output for the problem query.  But it's hard to believe that
the planner would estimate a probe on a unique index as costing more
than 94812.85 units, which is what this fragment seems to suggest.
If you look at eqsel() you will observe that the presence of a
unique index overrides any information from statistics, so there
is no "casino" behavior here IMO: it should realize that
"order_id = $1" selects a single row no matter what.

I'm wondering if it didn't use that index because it *couldn't* use
that index, for who-knows-what reason.  There are operations that
will leave an index marked as invalid and hence unsafe to query on.
Anyway, -EINSUFFICIENTINFORMATION.

            regards, tom lane



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:


On Fri, Oct 3, 2025 at 16:31 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
> ... we have a frequent query like this:

> SELECT ..
> FROM orders
> WHERE
>     order_id = $1 AND
>     state = 'WAIT_EVENT';

> which almost always uses the primary key.  But sometimes, perhaps
> after an autovacuum or something, something changes and postgres
> decides to start serving that query through the orders_wait_event_idx
> index.

You haven't given us a lot to go on: no reproducible test case,

I've provided two.  Both make the planner look bad.

no clear description of what triggers the issue,

..

not even the complete
EXPLAIN output for the problem query.  But it's hard to believe that
the planner would estimate a probe on a unique index as costing more
than 94812.85 units, which is what this fragment seems to suggest.

That was an after-the-fact demonstration of how expensive gambling on the index can be.


If you look at eqsel() you will observe that the presence of a
unique index overrides any information from statistics, so there
is no "casino" behavior here IMO: it should realize that
"order_id = $1" selects a single row no matter what

Then explain what's going on in the test case.  I'll be at the slot machine with the planner.


.m

Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
David Rowley
Дата:
On Sat, 4 Oct 2025 at 15:40, Marko Tiikkaja <marko@joh.to> wrote:
>
> On Fri, Oct 3, 2025 at 16:31 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You haven't given us a lot to go on: no reproducible test case,
>
> I've provided two.  Both make the planner look bad.

I expect what might be happening here is that ANALYZE runs when there
are no state = 'WAIT_EVENT' rows and thinks using the index on that
column is a better idea than the primary key index. Both selectivities
will be clamped at 1 row, so the index with fewer pages is likely to
win.  Then the problem happens after lots of state = 'WAIT_EVENT' rows
have been added, but not enough to trigger an auto-analyze. In that
case, the planner would still think there's 1 row, but there could be
any number of rows.

create table t (a bigint primary key, b int not null);
create index on t (b);
insert into t select x, x%10+1 from generate_Series(1,1000000)x;
analyze t;
explain select * from t where a=1 and b=0;
                            QUERY PLAN
------------------------------------------------------------------
 Index Scan using t_b_idx on t  (cost=0.42..8.40 rows=1 width=12)
   Index Cond: (b = 0)
   Filter: (a = 1)
(3 rows)

Just to see the cost comparison between the two, drop the index on "b":

drop index t_b_idx;
explain select * from t where a=1 and b=0;
                           QUERY PLAN
-----------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.42..8.45 rows=1 width=12) <- a
bit more costly.
   Index Cond: (a = 1)
   Filter: (b = 0)
(3 rows)

Basically, the planner isn't very good at taking risk into account
when it comes to this sort of thing.  Our cost model doesn't have a
risk factor. We have added a few things into add_path() over the years
to try to do something better in these scenarios. They often help.
However, there's none for this scenario.  The attached patch adds one.
I'm not proposing we do this, but if you try my example above with it
applied, you'll see the t_pkey index is used. I feel the hack is too
fragile to be a proper fix.

You might be able to work around the issue by adjusting the query.
Ordinarily, you could do something like "WHERE indexed_col + 0 =
<value>" to bypass using an index on that column (assuming the column
is an integer type). I can't off the top of my head think of a way to
do that with an enum column.

Another idea on how you could fix is to adjust the
orders_wait_event_idx index somehow. Maybe make it non-partial and
index the enum column and INCLUDE another column to make the index
wider and more costly to scan. Or maybe move it to another tablespace
with higher page costs.

David

Вложения

Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Wyatt Alt
Дата:
On Fri, Oct 3, 2025 at 8:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
I can't off the top of my head think of a way to
do that with an enum column.

Casting the enum to text seems to work:

EXPLAIN SELECT 1 FROM orders WHERE order_id = 1 AND state = 'WAIT_EVENT';
                                     QUERY PLAN                                    
------------------------------------------------------------------------------------
 Index Scan using orders_wait_event_idx on orders  (cost=0.12..2.34 rows=1 width=4)
   Filter: (order_id = 1)
(2 rows)

EXPLAIN SELECT 1 FROM orders WHERE order_id = 1 AND state::text = 'WAIT_EVENT';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Scan using orders_pkey on orders  (cost=0.42..2.65 rows=1 width=4)
   Index Cond: (order_id = 1)
   Filter: ((state)::text = 'WAIT_EVENT'::text)
(3 rows)

Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 4 Oct 2025 at 15:40, Marko Tiikkaja <marko@joh.to> wrote:
>> On Fri, Oct 3, 2025 at 16:31 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You haven't given us a lot to go on: no reproducible test case,

>> I've provided two.  Both make the planner look bad.

You've provided *no* reproducible test case that makes the planner
look bad.  The filled-in test case has two possibilities that are
both pretty cheap and the planner knows they are pretty cheap, so
it hardly matters which one it takes.  You showed us a fragment
of a case where it chose a very expensive scan that it shouldn't
have, but no useful information about how to reproduce that.

> I expect what might be happening here is that ANALYZE runs when there
> are no state = 'WAIT_EVENT' rows and thinks using the index on that
> column is a better idea than the primary key index.

Marko's fragment at [1] shows

Index Scan using orders_wait_event_idx on orders
(cost=0.54..94812.85 rows=85043 width=1223) (actual
time=0.166..7199.020 rows=84535 loops=1)
   Buffers: shared hit=15676 read=91962 dirtied=1988
 Planning:
   Buffers: shared hit=807 read=11 dirtied=1
 Planning Time: 4.634 ms
 Execution Time: 7204.117 ms

That's a pretty darn accurate rowcount estimate, so "out of date
stats" doesn't seem to be the problem.  I don't think it could
possibly have believed that scanning the pkey index would fetch that
many rows, or cost that much.  So why didn't it pick that index?
No way to tell from this amount of info, but I'm suspecting a
moving part that we don't know about.

(Hmmm ... why is there not a "Filter: (order_id = $1)" line here?)

            regards, tom lane

[1] https://www.postgresql.org/message-id/CAL9smLB1szUHLMfpN19FKiCHRCs4WvfjqXbxKaCUjmDzEtT%3Dng%40mail.gmail.com



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
David Rowley
Дата:
On Sun, 5 Oct 2025 at 06:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Index Scan using orders_wait_event_idx on orders
> (cost=0.54..94812.85 rows=85043 width=1223) (actual
> time=0.166..7199.020 rows=84535 loops=1)

> That's a pretty darn accurate rowcount estimate, so "out of date
> stats" doesn't seem to be the problem.  I don't think it could
> possibly have believed that scanning the pkey index would fetch that
> many rows, or cost that much.  So why didn't it pick that index?
> No way to tell from this amount of info, but I'm suspecting a
> moving part that we don't know about.
>
> (Hmmm ... why is there not a "Filter: (order_id = $1)" line here?)

Marko did mention:

On Sat, 4 Oct 2025 at 15:40, Marko Tiikkaja <marko@joh.to> wrote:
> That was an after-the-fact demonstration of how expensive gambling on the index can be.

I assumed since the EXPLAIN didn't match the query that the EXPLAIN
output was fabricated afterwards from the server without the
"order_id" qual to try to illustrate the index that was used and the
row numbers that index had to visit. It would be good to get
confirmation of that from Marko. Otherwise, indeed, something very
strange is going on if the planner produced the above plan for the
given query.

David



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On Sun, Oct 5, 2025 at 3:34 AM David Rowley <dgrowleyml@gmail.com> wrote:
> Marko did mention:
>
> On Sat, 4 Oct 2025 at 15:40, Marko Tiikkaja <marko@joh.to> wrote:
> > That was an after-the-fact demonstration of how expensive gambling on the index can be.
>
> I assumed since the EXPLAIN didn't match the query that the EXPLAIN
> output was fabricated afterwards from the server without the
> "order_id" qual to try to illustrate the index that was used and the
> row numbers that index had to visit. It would be good to get
> confirmation of that from Marko.

That's exactly it.

I can't believe I actually have to do this, but run:

CREATE TYPE order_state AS ENUM ('INITIAL', 'WAIT_EVENT', 'DONE');
CREATE TABLE orders(
    order_id bigserial PRIMARY KEY,
    state order_state NOT NULL DEFAULT 'INITIAL'
);
CREATE INDEX orders_wait_event_idx ON orders ((1)) WHERE state = 'WAIT_EVENT';
INSERT INTO orders (state) SELECT CASE WHEN random() <= 0.8 THEN
order_state 'WAIT_EVENT' ELSE order_state 'DONE' END FROM
generate_series(1, 65536 * 12);
UPDATE orders SET state = 'DONE' WHERE state = 'WAIT_EVENT';
ANALYZE orders;
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM orders WHERE order_id = 1 AND
state = 'WAIT_EVENT';

and I get:

                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders_wait_event_idx on orders  (cost=0.38..1.99
rows=1 width=4) (actual time=22.482..22.482 rows=0 loops=1)
   Filter: (order_id = 1)
   Buffers: shared hit=4746
 Planning:
   Buffers: shared hit=18
 Planning Time: 0.085 ms
 Execution Time: 22.488 ms
(7 rows)

.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On Sat, Oct 4, 2025 at 8:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Sat, 4 Oct 2025 at 15:40, Marko Tiikkaja <marko@joh.to> wrote:
> >> On Fri, Oct 3, 2025 at 16:31 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> You haven't given us a lot to go on: no reproducible test case,
>
> >> I've provided two.  Both make the planner look bad.
>
> You've provided *no* reproducible test case that makes the planner
> look bad.  The filled-in test case has two possibilities that are
> both pretty cheap and the planner knows they are pretty cheap, so
> it hardly matters which one it takes.  You showed us a fragment
> of a case where it chose a very expensive scan that it shouldn't
> have, but no useful information about how to reproduce that.

Hardly matters?  I'd say 54.682 ms vs. 0.077 ms is a big deal.
Especially because the planner doesn't seem to have any idea what the
upper bound on the first one could be.


.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Tom Lane
Дата:
Marko Tiikkaja <marko@joh.to> writes:
> I can't believe I actually have to do this, but run:

That's not what was asked for.  Your original report showed a
case where the planner chose orders_wait_event_idx despite
estimating that it would have to scan 80K-plus rows.
That's the case where I don't understand how it could think
that the primary key index would be even more expensive.

I'm not particularly concerned about the case you show here,
because if you run the EXPLAIN a second time it gets a lot
cheaper.  I believe the reason is that the first time is
visiting a bunch of just-deleted rows and so it has to stop
and update their hint bits, both in the heap and the index.
That is not a cost that the planner can reasonably predict
in advance, and even if we could it's not really fair to
blame it on the choice of index.  The pkey alternative only
visits one row and therefore only updates one hint bit, but
we'd have to update the rest sooner or later.  So IMO the
speed difference is largely illusory because it arises from
deferring maintenance.  You can demonstrate this by changing
the "ANALYZE orders" in the test case to "VACUUM ANALYZE
orders": the EXPLAIN still chooses orders_wait_event_idx,
but now it's fast because all that work got done by VACUUM.

            regards, tom lane



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On Sun, Oct 5, 2025 at 9:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not particularly concerned about the case you show here,
> because if you run the EXPLAIN a second time it gets a lot
> cheaper.

Sure, it get cheaper, but it's still a bad plan, and significantly
slower than the obvious one.

> I believe the reason is that the first time is
> visiting a bunch of just-deleted rows and so it has to stop
> and update their hint bits, both in the heap and the index.
> That is not a cost that the planner can reasonably predict
> in advance, and even if we could it's not really fair to
> blame it on the choice of index.  The pkey alternative only
> visits one row and therefore only updates one hint bit, but
> we'd have to update the rest sooner or later.  So IMO the
> speed difference is largely illusory because it arises from
> deferring maintenance.

On the production server the query still consistently took ~100ms even
after all that maintenance had been done.  Compared to <0.05ms when
going through the primary key.

But it sounds like you're saying the planner is working as expected
here, so I'll just drop the issue.  I've already implemented a
workaround in production via a no-op function.


.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
David Rowley
Дата:
On Mon, 6 Oct 2025 at 23:39, Marko Tiikkaja <marko@joh.to> wrote:
> On the production server the query still consistently took ~100ms even
> after all that maintenance had been done.  Compared to <0.05ms when
> going through the primary key.

This part surprises me. Is this on a standby server or primary? Are
you certain that no rows are matching the state = 'WAIT_EVENT' qual?

David



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
Marko Tiikkaja
Дата:
On Mon, Oct 6, 2025 at 2:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Mon, 6 Oct 2025 at 23:39, Marko Tiikkaja <marko@joh.to> wrote:
> > On the production server the query still consistently took ~100ms even
> > after all that maintenance had been done.  Compared to <0.05ms when
> > going through the primary key.
>
> This part surprises me. Is this on a standby server or primary?

It's a primary.

> Are
> you certain that no rows are matching the state = 'WAIT_EVENT' qual?

There are, around 80k rows, but it fluctuates quite a bit.

I spent some time working on my test program to gather more data, so
I'll give this one more go.  All of these are from production, so the
schema is slightly different.

Here's what the plan looks like when things are good:

Index Scan using orders_pkey on orders  (cost=0.58..2.20 rows=1
width=4) (actual time=0.172..0.172 rows=0 loops=1)
  Index Cond: (orderid = '7000000000'::bigint)
  Filter: (orderstatusid = 3)
  Buffers: shared hit=3 read=1
Planning:
  Buffers: shared hit=1
Planning Time: 0.140 ms
Execution Time: 0.178 ms

Then sometimes auto-analyze just doesn't sample enough pages which are
WAIT_EVENT (orderstatusid = 3), so the plan changes into this:

Index Scan using index_orders_wait_event on orders  (cost=0.50..2.09
rows=1 width=4) (actual time=9512.921..9512.921 rows=0 loops=1)
  Filter: (orderid = '7000000000'::bigint)
  Rows Removed by Filter: 88520
  Buffers: shared hit=161591 read=6066 dirtied=139 written=125
Planning:
  Buffers: shared hit=1
Planning Time: 0.139 ms
Execution Time: 9512.934 ms

It's only this slow for the first execution, then the following
executions take between 50ms and 100ms.  My program isn't intelligent
enough to catch it, but if I reproduce an equivalent manually, it
looks like:

 Index Scan using index_orders_wait_event on orders (actual
time=89.004..89.004 rows=0 loops=1)
   Filter: ((orderid + 1) = '7000000000'::bigint)
   Rows Removed by Filter: 83280
   Buffers: shared hit=91856
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.198 ms
 Execution Time: 89.014 ms
(8 rows)

So still multiple orders of magnitude slower than the PRIMARY KEY plan.

Some time later another auto-analyze happens to see enough of these
(orderstatusid = 3) rows, and the plan changes back to the PRIMARY KEY
one.

The table is around 600 GB in size, so having a VACUUM running through
the table 24/7 to combat this isn't feasible.  But it doesn't looke
like it would help too much, as the partial index is constantly around
300 MB in size, even after an autovacuum.  If there was a VACUUM
INDEX, that probably could be running 24/7, but as far as I know there
isn't a way to do just one index like that.  REINDEXing now and then
trying to get more aggressive micro-vacuuming on the index by running
SELECTs that scan through it could also work; not sure.

In short, if:

1) there was no index bloat
2) the statistics were perfectly accurate for the pre-existing data
3) the distribution of data couldn't change

one could save up to 0.178 ms for the first execution or around 0.03
ms for subsequent ones.  But all of three of those are a risk, and
sometimes we end up losing 9512.934 ms for the first execution or
around 89 ms for subsequent ones.

If there's more info I can provide without too much effort I'll be happy to.


.m



Re: [BUGS] BUG #11500: PRIMARY KEY index not being used

От
David Rowley
Дата:
On Sat, 11 Oct 2025 at 03:53, Marko Tiikkaja <marko@joh.to> wrote:
>
> On Mon, Oct 6, 2025 at 2:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > Are
> > you certain that no rows are matching the state = 'WAIT_EVENT' qual?
>
> There are, around 80k rows, but it fluctuates quite a bit.

Thanks for confirming that. That is subtly different from what you
indicated in the last report, and it does seem like my suspicions in
[1] were correct after all.

Unfortunately, you've hit a limitation in our cost model and we can't
really do much about that as a bug fix as it comes down to design.

The no-op function you've added (which I assume is a stable plpgsql
function to return the input order_state enum) is an adequate
workaround to stop the wait_event index being used for the problem
query.

David

[1] https://postgr.es/m/CAApHDvp=+EN2o=8fD47nyKKOSiQbLdYGLfUb=2obLWwQxUjyfg@mail.gmail.com