Обсуждение: Question on execution plan and suitable index

Поиск
Список
Период
Сортировка

Question on execution plan and suitable index

От
yudhi s
Дата:
Hi,
It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring down the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a significant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with event_audit_log table".

Below is the query and its complete plan:- 

I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text) or should we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for "event_comment_text". How and when the covering index indexes should be used here in postgres. Want to understand from experts? 
2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the composite index or is it fine to keep it as non leading?

->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
 Buffers: shared hit=10014901
 ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
 Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
 Rows Removed by Filter: 6572678
 Buffers: shared hit=755208


Regards
Yudhi

Re: Question on execution plan and suitable index

От
Adrian Klaver
Дата:
On 2/15/26 11:04, yudhi s wrote:
> Hi,
> It's postgres version 17. We are having a critical UI query which runs 
> for ~7 seconds+. The requirement is to bring down the response time 
> within ~1 sec. Now in this plan , If i read this correctly, the below 
> section is consuming a significant amount of resources and should be 
> addressed. i.e. "Full scan of table "orders" and Nested loop with 
> event_audit_log table".

For a start:

1) Supply the complete schema for the tables involved.

2) Also what is the minor version you are using e.g the x in 17.x?

I also recommend reading:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

> 
> *Below is the query and its complete plan:- *
> https://gist.github.com/databasetech0073/ 
> f564ac23ee35d1f0413980fe4d00efa9 <https://gist.github.com/ 
> databasetech0073/f564ac23ee35d1f0413980fe4d00efa9>
> 
> I am a bit new to the indexing strategy in postgres. My question is, 
> what suitable index should we create to cater these above?
> 
> 1)For table event_audit_log:- Should we create composite Index on column 
> (request_id,created_at,event_comment_text) or should we create the 
> covering index i.e. just on two column (request_id,created_at) with 
> "include" clause for "event_comment_text". How and when the covering 
> index indexes should be used here in postgres. Want to understand from 
> experts?
> 2)Similarly for table orders:- Should we create a covering index on 
> column (entity_id,due_date,order_type) with include clause 
> (firm_dspt_case_id). Or just a composite index 
> (entity_id,due_date,order_type).
> 3)Whether the column used as range operator (here created_at or 
> due_date) should be used as leading column in the composite index or is 
> it fine to keep it as non leading?
> 
> ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual 
> time=280.735..7065.313 rows=57943 loops=3)
>   Buffers: shared hit=10014901
>   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual 
> time=196.407..3805.755 rows=278131 loops=3)
> Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
> Buffers: shared hit=755352
> ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 
> width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
>   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= 
> '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
>   Rows Removed by Filter: 6572678
>   Buffers: shared hit=755208
> 
> 
> Regards
> Yudhi


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Question on execution plan and suitable index

От
Nisarg Patel
Дата:
I would definitely recommend to focus on this section:
Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
Rows Removed by Filter: 6572678

You can certainly try composite index or partial index for order_type.

Thank you,
Nisarg


On Sun, Feb 15, 2026, 3:51 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 2/15/26 11:04, yudhi s wrote:
> Hi,
> It's postgres version 17. We are having a critical UI query which runs
> for ~7 seconds+. The requirement is to bring down the response time
> within ~1 sec. Now in this plan , If i read this correctly, the below
> section is consuming a significant amount of resources and should be
> addressed. i.e. "Full scan of table "orders" and Nested loop with
> event_audit_log table".

For a start:

1) Supply the complete schema for the tables involved.

2) Also what is the minor version you are using e.g the x in 17.x?

I also recommend reading:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

>
> *Below is the query and its complete plan:- *
> https://gist.github.com/databasetech0073/
> f564ac23ee35d1f0413980fe4d00efa9 <https://gist.github.com/
> databasetech0073/f564ac23ee35d1f0413980fe4d00efa9>
>
> I am a bit new to the indexing strategy in postgres. My question is,
> what suitable index should we create to cater these above?
>
> 1)For table event_audit_log:- Should we create composite Index on column
> (request_id,created_at,event_comment_text) or should we create the
> covering index i.e. just on two column (request_id,created_at) with
> "include" clause for "event_comment_text". How and when the covering
> index indexes should be used here in postgres. Want to understand from
> experts?
> 2)Similarly for table orders:- Should we create a covering index on
> column (entity_id,due_date,order_type) with include clause
> (firm_dspt_case_id). Or just a composite index
> (entity_id,due_date,order_type).
> 3)Whether the column used as range operator (here created_at or
> due_date) should be used as leading column in the composite index or is
> it fine to keep it as non leading?
>
> ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual
> time=280.735..7065.313 rows=57943 loops=3)
>   Buffers: shared hit=10014901
>   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual
> time=196.407..3805.755 rows=278131 loops=3)
> Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
> Buffers: shared hit=755352
> ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860
> width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
>   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=
> '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
>   Rows Removed by Filter: 6572678
>   Buffers: shared hit=755208
>
>
> Regards
> Yudhi


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Question on execution plan and suitable index

От
Laurenz Albe
Дата:
On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring
downthe response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a
significantamount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_logtable". 
>
> Below is the query and its complete plan:- 
> https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
>
> I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater
theseabove? 
>
> 1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text)
orshould we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for
"event_comment_text".How and when the covering index indexes should be used here in postgres. Want to understand from
experts? 
> 2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with
includeclause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type). 
> 3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the
compositeindex or is it fine to keep it as non leading? 
>
> ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
>   Buffers: shared hit=10014901
>   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
>  Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
>  Buffers: shared hit=755352
>  ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627
rows=2944671loops=3) 
>   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY
('{TYPE_A,TYPE_B}'::text[])))
>   Rows Removed by Filter: 6572678
>   Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really cheap.
But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index is ideal:

   CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".

Yours,
Laurenz Albe



Re: Question on execution plan and suitable index

От
yudhi s
Дата:


On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring down the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a significant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with event_audit_log table".
>
> Below is the query and its complete plan:- 
> https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
>
> I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater these above?
>
> 1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text) or should we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for "event_comment_text". How and when the covering index indexes should be used here in postgres. Want to understand from experts? 
> 2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type).
> 3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the composite index or is it fine to keep it as non leading?
>
> ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
>   Buffers: shared hit=10014901

>   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
>  Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
>  Buffers: shared hit=755352
>  ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
>   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
>   Rows Removed by Filter: 6572678
>   Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really cheap.
But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index is ideal:

   CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".



Version is 17.7. Below is the table definitions as i pulled from Dbeaver tool:-


The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is the distribution. So , it looks like the index on this column will not help much. Correct me if I'm wrong. I am wondering why the already existing index on column "due_date" of table "order" is not getting used by the optimizer? Should we also add the column "entity_id" to the index too?

TYPE_A  25 Million
TYPE_B  2 Million
TYPE_C  700K
TYPE_D  200K
TYPE_E  6k

And, Yes there are differences in data types of the "entity_id" for columns of table "order" and "entity". We need to fix that after analyzing the data.

Also the highlighted Nested loop above shows ~10M shared hits (which will be ~70GB+ if we consider one hit as an 8K block). So does that mean , apart from the Full scan on the "order" table , the main resource consuming factor here is the scanning of "event_audit_log". And what is the best way to improve this? Currently this table is getting scanned through an unique index on column "request_id".

Regards
Yudhi

 

Re: Question on execution plan and suitable index

От
Laurenz Albe
Дата:
On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote:
> On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> > > It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to
bringdown the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming
asignificant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_logtable". 
> > >
> > > Below is the query and its complete plan:- 
> > > https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
> > >
> > > I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to
caterthese above? 
> > >
> > > 1)For table event_audit_log:- Should we create composite Index on column
(request_id,created_at,event_comment_text)or should we create the covering index i.e. just on two column
(request_id,created_at)with "include" clause for "event_comment_text". How and when the covering index indexes should
beused here in postgres. Want to understand from experts?  
> > > 2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with
includeclause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type). 
> > > 3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the
compositeindex or is it fine to keep it as non leading? 
> > >
> > > ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
> > >   Buffers: shared hit=10014901
> > >   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
> > >  Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
> > >  Buffers: shared hit=755352
> > >  ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual
time=139.883..3152.627rows=2944671 loops=3) 
> > >   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY
('{TYPE_A,TYPE_B}'::text[])))
> > >   Rows Removed by Filter: 6572678
> > >   Buffers: shared hit=755208
> >
> > You are selecting a lot of rows, so the query will never be really cheap.
> > But I agree that an index scan should be a win.
> >
> > If the condition on "order_type" is always the same, a partial index is ideal:
> >
> >    CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');
> >
> > Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".
>
> Version is 17.7. Below is the table definitions as i pulled from Dbeaver tool:-
>
> https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e 
>
> The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is the distribution.
> So , it looks like the index on this column will not help much. Correct me if I'm wrong.
>
> TYPE_A  25 Million
> TYPE_B  2 Million
> TYPE_C  700K
> TYPE_D  200K
> TYPE_E  6k

No, you are right about that.

> I am wondering why the already existing index on column "due_date" of table "order" is not
> getting used by the optimizer? Should we also add the column "entity_id" to the index too?

Seeing that your execution plan is incomplete, it is hard to say anything about that.
The scans of "entities" are missing, as is the UNION.

> And, Yes there are differences in data types of the "entity_id" for columns of table "order"
> and "entity". We need to fix that after analyzing the data.
>
> Also the highlighted Nested loop above shows ~10M shared hits (which will be ~70GB+ if we
> consider one hit as an 8K block). So does that mean , apart from the Full scan on the "order"
> table , the main resource consuming factor here is the scanning of "event_audit_log".

Correct.

Yours,
Laurenz Albe



Re: Question on execution plan and suitable index

От
yudhi s
Дата:

On Mon, Feb 16, 2026 at 3:24 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote:
> On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> > > It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring down the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a significant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with event_audit_log table".
> > >
> > > Below is the query and its complete plan:- 
> > > https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
> > >
> > > I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater these above?
> > >
> > > 1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text) or should we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for "event_comment_text". How and when the covering index indexes should be used here in postgres. Want to understand from experts? 
> > > 2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type).
> > > 3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the composite index or is it fine to keep it as non leading?
> > >
> > > ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
> > >   Buffers: shared hit=10014901
> > >   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
> > >  Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
> > >  Buffers: shared hit=755352
> > >  ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
> > >   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
> > >   Rows Removed by Filter: 6572678
> > >   Buffers: shared hit=755208
> >
> > You are selecting a lot of rows, so the query will never be really cheap.
> > But I agree that an index scan should be a win.
> >
> > If the condition on "order_type" is always the same, a partial index is ideal:
> >
> >    CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');
> >
> > Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".
>
> Version is 17.7. Below is the table definitions as i pulled from Dbeaver tool:-
>
> https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e 
>
> The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is the distribution.
> So , it looks like the index on this column will not help much. Correct me if I'm wrong.
>
> TYPE_A  25 Million
> TYPE_B  2 Million
> TYPE_C  700K
> TYPE_D  200K
> TYPE_E  6k

No, you are right about that.

> I am wondering why the already existing index on column "due_date" of table "order" is not
> getting used by the optimizer? Should we also add the column "entity_id" to the index too?

Seeing that your execution plan is incomplete, it is hard to say anything about that.
The scans of "entities" are missing, as is the UNION.

> And, Yes there are differences in data types of the "entity_id" for columns of table "order"
> and "entity". We need to fix that after analyzing the data.
>
> Also the highlighted Nested loop above shows ~10M shared hits (which will be ~70GB+ if we
> consider one hit as an 8K block). So does that mean , apart from the Full scan on the "order"
> table , the main resource consuming factor here is the scanning of "event_audit_log".

Correct.

Yours,
Laurenz Albe

Hi,
I have updated the plan below. While trying to replace actual binds and the objects with sample names some lines got missed initially it seems.

Re: Question on execution plan and suitable index

От
Laurenz Albe
Дата:
On Mon, 2026-02-16 at 16:09 +0530, yudhi s wrote:
> I have updated the plan below. While trying to replace actual binds and
> the objects with sample names some lines got missed initially it seems.
>
> https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9 

Thanks.

Does the filter on "due_date" eliminate many rows in "orders"?  If yes,
and an index on that column would actually perform better (which you
could test with enable_seqscan = off), perhaps your "random_page_cost"
parameter is set too high.

Where you can certainly make a difference is the repeated scan on
"event_audit_log".  An index on (request_id, event_comment_text, created_at)
should speed up that part.

Yours,
Laurenz Albe



Re: Question on execution plan and suitable index

От
yudhi s
Дата:


On Mon, Feb 16, 2026 at 5:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2026-02-16 at 16:09 +0530, yudhi s wrote:
> I have updated the plan below. While trying to replace actual binds and
> the objects with sample names some lines got missed initially it seems.
>
> https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9 

Thanks.

Does the filter on "due_date" eliminate many rows in "orders"?  If yes,
and an index on that column would actually perform better (which you
could test with enable_seqscan = off), perhaps your "random_page_cost"
parameter is set too high.

Where you can certainly make a difference is the repeated scan on
"event_audit_log".  An index on (request_id, event_comment_text, created_at)
should speed up that part.

Yours,
Laurenz Albe

Thank you so much. Will try this one.

Regarding the composite index on (request_id, event_comment_text, created_at) for table event_audit_log, is there any advice, which we should follow for keeping "date column"(like column "Created_at" here) in the indexing order (apart from the frequency of usage in the query)?

And to help the table scan of the ORDER table, should we also have "entity_id" added to the index along with "due_date" i.e. a composite index on (entity_id,due_date)?

Regards
Yudhi

Re: Question on execution plan and suitable index

От
Laurenz Albe
Дата:
On Mon, 2026-02-16 at 17:52 +0530, yudhi s wrote:
> Regarding the composite index on (request_id, event_comment_text, created_at) for
> table event_audit_log, is there any advice, which we should follow for keeping
> "date column"(like column "Created_at" here) in the indexing order (apart from
> the frequency of usage in the query)?

The rule is to first have all columns that are compared with equality, then the
others, starting with the most selective one.  Based on the condition, I guessed
that that wouldn't be your "date" column, but if it is, put it second.

> And to help the table scan of the ORDER table, should we also have "entity_id"
> added to the index along with "due_date" i.e. a composite index on (entity_id,due_date)?

I don't know; you'd have to test it on your test system.
The current execution plan has no use for such an index, but adding additional
columns could

1. lead to an efficient index-only scan

2. make PostgreSQL pick an altogether different, better plan

Yours,
Laurenz Albe