Обсуждение: Top -N Query performance issue and high CPU usage

Поиск
Список
Период
Сортировка

Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:
Hello Experts,
 We have a "Select" query which is using three to five main transaction tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million rows in each of them(which is going to increase to have ~50-100million in future) and others(6-7) tables out of which some are master and some other small tables.

When we are running this query , and it's taking ~2-3seconds , however when we hit this query from 10-15 session at same time its causing CPU spike up to ~50-60% for the DB instance and this is increasing and touching 90% when we are increasing the hits further to 40-50 times concurrently.

This query is going to be called in the first page of an UI screen and is supposed to show the latest 1000 rows based on a certain transaction date. This query is supposed to allow thousands of users to hit this same query at the first landing page at the same time.  Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in postgres to understand what part/line of the query is contributing to the high cpu time?
2)How can we tune this query to further reduce response time and mainly CPU consumption ? Is any additional index or anything will make this plan better further?
3) Is there any guidance or best practices exists , to create/design top N-queries for such UI scenarios where performance is an important factor?
4)And based on the CPU core and memory , is there any calculation by using which , we can say that this machine can support a maximum N number of concurrent queries of such type beyond which we need more cpu cores machines?

Below is the query and its current plan:-

Regards
Yudhi 

Re: Top -N Query performance issue and high CPU usage

От
David Mullineux
Дата:


On Sat, 31 Jan 2026, 13:30 yudhi s, <learnerdatabase99@gmail.com> wrote:
Hello Experts,
 We have a "Select" query which is using three to five main transaction tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million rows in each of them(which is going to increase to have ~50-100million in future) and others(6-7) tables out of which some are master and some other small tables.

When we are running this query , and it's taking ~2-3seconds , however when we hit this query from 10-15 session at same time its causing CPU spike up to ~50-60% for the DB instance and this is increasing and touching 90% when we are increasing the hits further to 40-50 times concurrently.

This query is going to be called in the first page of an UI screen and is supposed to show the latest 1000 rows based on a certain transaction date. This query is supposed to allow thousands of users to hit this same query at the first landing page at the same time.  Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in postgres to understand what part/line of the query is contributing to the high cpu time?
2)How can we tune this query to further reduce response time and mainly CPU consumption ? Is any additional index or anything will make this plan better further?
3) Is there any guidance or best practices exists , to create/design top N-queries for such UI scenarios where performance is an important factor?
4)And based on the CPU core and memory , is there any calculation by using which , we can say that this machine can support a maximum N number of concurrent queries of such type beyond which we need more cpu cores machines?

Below is the query and its current plan:-

Regards
Yudhi 

Plan says it's using temp files for sorting....I would suggest you increase work_mem for this to avoid temp.fike creation...Although not the answer to all your problems, it would be a good start .

Re: Top -N Query performance issue and high CPU usage

От
Adrian Klaver
Дата:
On 1/31/26 05:30, yudhi s wrote:
> Hello Experts,

> This query is going to be called in the first page of an UI screen and 
> is supposed to show the latest 1000 rows based on a certain transaction 
> date. This query is supposed to allow thousands of users to hit this 
> same query at the first landing page at the same time. Its postgres 
> version 17.  The instance has 2-VCPU and 16GB RAM.

1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM 
and is insufficient resources for what you want to do.

2) You will need to provide the schema definitions for the tables involved.

4)And based on the CPU core and memory , is there any calculation by 
using which , we can say that this machine can support a maximum N 
number of concurrent queries of such type beyond which we need more cpu 
cores machines?

You already have the beginnings of a chart:

1 session 2-3 secs

10-15 sessions 50-60% usage

40-50 sessions 90% usage

> 
> Regards
> Yudhi


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


Plan says it's using temp files for sorting....I would suggest you increase work_mem for this to avoid temp.fike creation...Although not the answer to all your problems, it would be a good start .


Even setting work_mem to 64MB remove all the "temp read" and showig all memory reads, but still we are seeing similar cpu spike when executing this query from multiple sessions and also the response time is staying same. 

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:
Thank you. 

1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
and is insufficient resources for what you want to do.


Can you please explain a bit in detail, how much minimum VCPU and RAM will be enough resources to suffice this requirement? and you normally do that calculation?
 
2) You will need to provide the schema definitions for the tables involved.

Do you mean table DDL or just the index definitions on the tables should help?

Also i was trying to understand , by just looking into the "explain analyze" output, is there any way we can tie the specific step in the plan , which is the major contributor of the cpu resources? Such that we can then try to fix that part rather than looking throughout the query as its big query?  

And if any suggestion to improve the TOP-N queries where the base table may have many rows in it.

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Sat, Jan 31, 2026 at 2:47 PM yudhi s <learnerdatabase99@gmail.com> wrote:
Thank you. 

1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
and is insufficient resources for what you want to do.


Can you please explain a bit in detail, how much minimum VCPU and RAM will be enough resources to suffice this requirement? and you normally do that calculation?
 
2) You will need to provide the schema definitions for the tables involved.

Do you mean table DDL or just the index definitions on the tables should help?

Also i was trying to understand , by just looking into the "explain analyze" output, is there any way we can tie the specific step in the plan , which is the major contributor of the cpu resources? Such that we can then try to fix that part rather than looking throughout the query as its big query?  

It looks like 71% (748ms of a total 1056ms) of elapsed time is taken by the c_1.tran_date  external sort on line 150.

That, obviously, is what you should work on.

1. You say you increased work_mem.  From what, to what?
2. But that it did not reduce execution time.  Please post the EXPLAIN from after increasing work_mem.
3. Did you remember to run SELECT pg_reload_conf(); after increasing work_mem?
4. Is there an index on APP_schema.txn_tbl.tran_date?

And if any suggestion to improve the TOP-N queries where the base table may have many rows in it.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
Luigi Nardi
Дата:


On Sat, Jan 31, 2026 at 10:05 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Sat, Jan 31, 2026 at 2:47 PM yudhi s <learnerdatabase99@gmail.com> wrote:
Thank you. 

1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
and is insufficient resources for what you want to do.


Can you please explain a bit in detail, how much minimum VCPU and RAM will be enough resources to suffice this requirement? and you normally do that calculation?
 
2) You will need to provide the schema definitions for the tables involved.

Do you mean table DDL or just the index definitions on the tables should help?

Also i was trying to understand , by just looking into the "explain analyze" output, is there any way we can tie the specific step in the plan , which is the major contributor of the cpu resources? Such that we can then try to fix that part rather than looking throughout the query as its big query?  

It looks like 71% (748ms of a total 1056ms) of elapsed time is taken by the c_1.tran_date  external sort on line 150.

That, obviously, is what you should work on.

1. You say you increased work_mem.  From what, to what?
2. But that it did not reduce execution time.  Please post the EXPLAIN from after increasing work_mem.
3. Did you remember to run SELECT pg_reload_conf(); after increasing work_mem?
4. Is there an index on APP_schema.txn_tbl.tran_date?

And if any suggestion to improve the TOP-N queries where the base table may have many rows in it.


The DBtune Free Edition can help you find the correct adjustments for work_mem and other server parameters. It's designed to help optimize your PostgreSQL runtime for your current hardware setup.

 
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
"Peter J. Holzer"
Дата:
On 2026-02-01 01:16:56 +0530, yudhi s wrote:
> Thank you. 
>
>
>     1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
>     and is insufficient resources for what you want to do.
>
>
>
> Can you please explain a bit in detail, how much minimum VCPU and RAM will be
> enough resources to suffice this requirement? and you normally do that
> calculation?

You wrote:

| This query is supposed to allow thousands of users to hit this same
| query at the first landing page at the same time.

If you meant that literally, you would need thousands of cores to handle
those thousands of simultaneous queries and enough RAM for thousands of
sessions, each performing a rather complex query. So possibly hundreds
of maybe even thousands of gigabytes, not 16.

However, maybe you didn't mean that. There are relatively few
applications where thousands of users log in within a second. Maybe you
just meant that there would be thousands of users logged in in total. If
so, how many simultaneus queries do you really expect?

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

        hjp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Sun, Feb 1, 2026 at 4:47 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
[snip] 
If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.
 
That's what I was thinking, too: app server background process continually runs that query in a loop, feeding the results to a shared cache; the end user connections then read the latest version of the cached results.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
Adrian Klaver
Дата:
On 1/31/26 11:46, yudhi s wrote:
> Thank you.
> 
> 
>     1) Without even looking at the plan I'm going to say 2-VCPU and 16GB
>     RAM
>     and is insufficient resources for what you want to do.
> 
> 
> Can you please explain a bit in detail, how much minimum VCPU and RAM 
> will be enough resources to suffice this requirement? and you normally 
> do that calculation?

Don't know what the minimum requirements are. It would depend on many 
variables 1) The plan being chosen, which in turn depends on the schema 
information as well as the data turnover. 2) What the VCPU is actually 
emulating. 3) The efficiency of of the virtual machines/containers with 
regard to accessing memory and storage. 4) The service limits of the 
virtualization. 5) What the storage system and how performant it is.

In other words this is something you will need to test and derive your 
own formula for.

> 
>     2) You will need to provide the schema definitions for the tables
>     involved.
> 
> Do you mean table DDL or just the index definitions on the tables should 
> help?

Basically what you get in psql when you do \d some_table.

> 
> Also i was trying to understand , by just looking into the "explain 
> analyze" output, is there any way we can tie the specific step in the 
> plan , which is the major contributor of the cpu resources? Such that we 
> can then try to fix that part rather than looking throughout the query 
> as its big query?
> 
> And if any suggestion to improve the TOP-N queries where the base table 
> may have many rows in it.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

However, maybe you didn't mean that. There are relatively few
applications where thousands of users log in within a second. Maybe you
just meant that there would be thousands of users logged in in total. If
so, how many simultaneus queries do you really expect?

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.


Thank you so much. I need to get back on the exact number of such queries which can hit the database. However, as 1000 of users will be there, so the possibility of all logging into the system on the same page at same time needs to be found out. Will double check on this.

However,  when you said caching :- The results on the base tables are going to be ~30-50 million. This landing page has filters on it so it may be of 30+ different combinations based on the user's choice. So do you suggest , we will populate the base data in a materialized view(named like "landing page data") which we can refresh (maybe once in ~5 minutes behind the scenes) and then that can be queried in the landing page directly. And we can have suitable indexes created on the materialized view based on the dynamic filter criteria?

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Mon, 2 Feb, 2026, 11:21 am Rob Sargent, <robjsargent@gmail.com> wrote:

> Thank you so much. I need to get back on the exact number of such queries which can hit the database. However, as 1000 of users will be there, so the possibility of all logging into the system on the same page at same time needs to be found out. Will double check on this.
>
> However,  when you said caching :- The results on the base tables are going to be ~30-50 million. This landing page has filters on it so it may be of 30+ different

I know I read OP’s earlier descriptions to suggest that each login saw the same data. I was wrong and I suspect the suggestion to cache goes out the window.

The need for more resources now comes centre stage, right beside query tuning. You won’t get much help here on the latter problem without more DDL on the tables involved. Help on the hardware is just money - though most desktops these days are more powerful than that vert described up-thread

Won't , the materialized view having a minimum Delta refresh frequency(5-10 minutes?) help in such scenarios? As the overhead of the query complexity will lie within the materialized view and it can be indexed as per the dynamic incoming filter conditions. 

Re: Top -N Query performance issue and high CPU usage

От
Thiemo Kellner
Дата:
Hi

Would it do any good to restrict the transaction date for the limit to something like "current timestamp - 1 day/hour/month". How about partitioning?

My two dimes

Thiemo

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

       

There was no index on column  tran_date  , I created one and it's making the query finish in  ~200ms, a lot faster than in the past. Below is the portion of the query and its plan which actually consumes most of the resource and time post the new index creation.


1) Now the part  which takes time is the "nested loop" join on the "ent_id"  column. Can we do anything to make it much better/faster?

2) Also another question I had was,  with this new index the table scan of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have other columns from that table projected in the query, so how its getting all those column values without visiting table but just that index scan backward operation?


Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Mon, Feb 2, 2026 at 6:39 AM yudhi s <learnerdatabase99@gmail.com> wrote:


On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

       

There was no index on column  tran_date  , I created one and it's making the query finish in  ~200ms, a lot faster than in the past. Below is the portion of the query and its plan which actually consumes most of the resource and time post the new index creation.


1) Now the part  which takes time is the "nested loop" join on the "ent_id"  column. Can we do anything to make it much better/faster?

2) Also another question I had was,  with this new index the table scan of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have other columns from that table projected in the query, so how its getting all those column values without visiting table but just that index scan backward operation?

Reading through EXPLAIN output isn't always a mystery.

Search for "actual time" and you'll find row 53, which is the "deepest" (most nested) row with the highest actual time.

That tells you where the time is now spent, and what it's doing.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Mon, Feb 2, 2026 at 7:04 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 6:39 AM yudhi s <learnerdatabase99@gmail.com> wrote:


On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

       

There was no index on column  tran_date  , I created one and it's making the query finish in  ~200ms, a lot faster than in the past. Below is the portion of the query and its plan which actually consumes most of the resource and time post the new index creation.


1) Now the part  which takes time is the "nested loop" join on the "ent_id"  column. Can we do anything to make it much better/faster?

2) Also another question I had was,  with this new index the table scan of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have other columns from that table projected in the query, so how its getting all those column values without visiting table but just that index scan backward operation?

Reading through EXPLAIN output isn't always a mystery.

Search for "actual time" and you'll find row 53, which is the "deepest" (most nested) row with the highest actual time.

That tells you where the time is now spent, and what it's doing.



My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

 

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Mon, Feb 2, 2026 at 8:53 AM yudhi s <learnerdatabase99@gmail.com> wrote:


On Mon, Feb 2, 2026 at 7:04 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 6:39 AM yudhi s <learnerdatabase99@gmail.com> wrote:


On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

       

There was no index on column  tran_date  , I created one and it's making the query finish in  ~200ms, a lot faster than in the past. Below is the portion of the query and its plan which actually consumes most of the resource and time post the new index creation.


1) Now the part  which takes time is the "nested loop" join on the "ent_id"  column. Can we do anything to make it much better/faster?

2) Also another question I had was,  with this new index the table scan of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have other columns from that table projected in the query, so how its getting all those column values without visiting table but just that index scan backward operation?

Reading through EXPLAIN output isn't always a mystery.

Search for "actual time" and you'll find row 53, which is the "deepest" (most nested) row with the highest actual time.

That tells you where the time is now spent, and what it's doing.



My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed. Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed.

We need to see everything, not just what you think is relevant.
 
Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed.

We need to see everything, not just what you think is relevant.
 
Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.


I am unable to run "\d" from the dbeaver sql worksheet. However,  I have fetched the DDL for the three tables and their selected columns, used in the smaller version of the query and its plan , which I recently updated. 



Regards
Yudhi 

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Mon, Feb 2, 2026 at 3:43 PM yudhi s <learnerdatabase99@gmail.com> wrote:

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed.

We need to see everything, not just what you think is relevant.
 
Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.


I am unable to run "\d" from the dbeaver sql worksheet. However,  I have fetched the DDL for the three tables and their selected columns, used in the smaller version of the query and its plan , which I recently updated. 



Lines 30-32 are where most of the time and effort are taken.

I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to int8 (with a CHECK constraint to, well, constrain it to 12 digits, if really necessary) is something I'd test.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 3:43 PM yudhi s <learnerdatabase99@gmail.com> wrote:

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed.

We need to see everything, not just what you think is relevant.
 
Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.


I am unable to run "\d" from the dbeaver sql worksheet. However,  I have fetched the DDL for the three tables and their selected columns, used in the smaller version of the query and its plan , which I recently updated. 



Lines 30-32 are where most of the time and effort are taken.

I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to int8 (with a CHECK constraint to, well, constrain it to 12 digits, if really necessary) is something I'd test.

--


Yudhi 

Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Tue, Feb 3, 2026 at 4:26 AM yudhi s <learnerdatabase99@gmail.com> wrote:
On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 3:43 PM yudhi s <learnerdatabase99@gmail.com> wrote:

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see,   it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here?  Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query.  Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.


Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time  and teh alias has changed.

We need to see everything, not just what you think is relevant.
 
Also here i have changed the real table names before putting it here, hope that is fine. 
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time. 

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.


I am unable to run "\d" from the dbeaver sql worksheet. However,  I have fetched the DDL for the three tables and their selected columns, used in the smaller version of the query and its plan , which I recently updated. 



Lines 30-32 are where most of the time and effort are taken.

I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to int8 (with a CHECK constraint to, well, constrain it to 12 digits, if really necessary) is something I'd test.

--



Hmm.  What does pg_stat_user_tables say about when you last analyzed and vacuumed APP_schema.txn_tbl and APP_schema.ent?

Beyond "aggressively keep those two tables analyzed, via reducing autovacuum_analyze_scale_factor to something like 0.05, and adding 'vacuumdb -d mumble -j2 --analyze-only -t APP_schema.txn_tbl -t APP_schema.ent' to crontab", I'm out of ideas.  An 85% speed improvement is nothing to sneeze at, though.
 

There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100% expected and normal.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
Adrian Klaver
Дата:
On 2/3/26 07:59, Ron Johnson wrote:

> 
> 
> There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100% 
> expected and normal.

What Ron is saying is that there are varchar and char types, but they 
boil down to text per:

https://www.postgresql.org/docs/current/datatype-character.html

"text is PostgreSQL's native string data type, in that most built-in 
functions operating on strings are declared to take or return text not 
character varying. For many purposes, character varying acts as though 
it were a domain over text."

As to performance see:

"
Tip

There is no performance difference among these three types, apart from 
increased storage space when using the blank-padded type, and a few 
extra CPU cycles to check the length when storing into a 
length-constrained column. While character(n) has performance advantages 
in some other database systems, there is no such advantage in 
PostgreSQL; in fact character(n) is usually the slowest of the three 
because of its additional storage costs. In most situations text or 
character varying should be used instead.
"

> 
> -- 
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Tue, 3 Feb, 2026, 9:37 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 2/3/26 07:59, Ron Johnson wrote:

>
>
> There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
> expected and normal.

What Ron is saying is that there are varchar and char types, but they
boil down to text per:

https://www.postgresql.org/docs/current/datatype-character.html

"text is PostgreSQL's native string data type, in that most built-in
functions operating on strings are declared to take or return text not
character varying. For many purposes, character varying acts as though
it were a domain over text."

As to performance see:

"
Tip

There is no performance difference among these three types, apart from
increased storage space when using the blank-padded type, and a few
extra CPU cycles to check the length when storing into a
length-constrained column. While character(n) has performance advantages
in some other database systems, there is no such advantage in
PostgreSQL; in fact character(n) is usually the slowest of the three
because of its additional storage costs. In most situations text or
character varying should be used instead.
"

Thank you. I was looking into those casting(::text) in the explain plan output in similar way (as it was happening for int8 to numeric join scenario) and was thinking, may be it's spending some cpu cycles on doing these ::text casting behind the scenes for that column and if there is someway(data type change) to stop those. But from your explanation, it looks like those representation in the query plan is normal and have no performance overhead as such. Thanks again. 

In regards to the below, "nested loop" having response time of 100ms. I understand, here the casting function us now removed after changing the data type of columns to match in both side of the join.

So, is this expected to do a nested loop on 500k rows to take 100ms?

->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (actual time=6.406..107.946 rows=1049 loops=1)
              Join Filter: (df.ent_id = m.ent_id)
              Rows Removed by Join Filter: 514648
              Buffers: shared hit=1972

Regards
Yudhi



Re: Top -N Query performance issue and high CPU usage

От
Ron Johnson
Дата:
On Tue, Feb 3, 2026 at 1:50 PM yudhi s <learnerdatabase99@gmail.com> wrote:


On Tue, 3 Feb, 2026, 9:37 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 2/3/26 07:59, Ron Johnson wrote:

>
>
> There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
> expected and normal.

What Ron is saying is that there are varchar and char types, but they
boil down to text per:

https://www.postgresql.org/docs/current/datatype-character.html

"text is PostgreSQL's native string data type, in that most built-in
functions operating on strings are declared to take or return text not
character varying. For many purposes, character varying acts as though
it were a domain over text."

As to performance see:

"
Tip

There is no performance difference among these three types, apart from
increased storage space when using the blank-padded type, and a few
extra CPU cycles to check the length when storing into a
length-constrained column. While character(n) has performance advantages
in some other database systems, there is no such advantage in
PostgreSQL; in fact character(n) is usually the slowest of the three
because of its additional storage costs. In most situations text or
character varying should be used instead.
"

Thank you. I was looking into those casting(::text) in the explain plan output in similar way (as it was happening for int8 to numeric join scenario) and was thinking, may be it's spending some cpu cycles on doing these ::text casting behind the scenes for that column and if there is someway(data type change) to stop those. But from your explanation, it looks like those representation in the query plan is normal and have no performance overhead as such. Thanks again. 

In regards to the below, "nested loop" having response time of 100ms. I understand, here the casting function us now removed after changing the data type of columns to match in both side of the join.

So, is this expected to do a nested loop on 500k rows to take 100ms?

HAVE YOU ANALYZED THE TABLES?
 

->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (actual time=6.406..107.946 rows=1049 loops=1)
              Join Filter: (df.ent_id = m.ent_id)
              Rows Removed by Join Filter: 514648
              Buffers: shared hit=1972

Decompose complex problems into a small problem, then start adding stuff.


In this case, I would run SELECT * FROM limited_txns, to get a base EXPLAIN, then strip out all WHERE clauses, the ORDER BY and the LIMIT then run it again for another EXPLAIN.

Then add back lines 33-34 and EXPLAIN.  Then line 7, etc, etc saving each EXPLAIN.  See what makes it break.
 
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Top -N Query performance issue and high CPU usage

От
"Peter J. Holzer"
Дата:
On 2026-02-04 00:20:20 +0530, yudhi s wrote:
>
>
> On Tue, 3 Feb, 2026, 9:37 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
>
>     On 2/3/26 07:59, Ron Johnson wrote:
>
>     >
>     >
>     > There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
>     > expected and normal.
>
>     What Ron is saying is that there are varchar and char types, but they
>     boil down to text per:
>
>     https://www.postgresql.org/docs/current/datatype-character.html
>
>     "text is PostgreSQL's native string data type, in that most built-in
>     functions operating on strings are declared to take or return text not
>     character varying. For many purposes, character varying acts as though
>     it were a domain over text."
>
>     As to performance see:
>
>     "
>     Tip
>
>     There is no performance difference among these three types, apart from
>     increased storage space when using the blank-padded type, and a few
>     extra CPU cycles to check the length when storing into a
>     length-constrained column. While character(n) has performance advantages
>     in some other database systems, there is no such advantage in
>     PostgreSQL; in fact character(n) is usually the slowest of the three
>     because of its additional storage costs. In most situations text or
>     character varying should be used instead.
>     "
>
>
> Thank you. I was looking into those casting(::text) in the explain plan output
> in similar way (as it was happening for int8 to numeric join scenario) and was
> thinking, may be it's spending some cpu cycles on doing these ::text casting
> behind the scenes for that column and if there is someway(data type change) to
> stop those. But from your explanation, it looks like those representation in
> the query plan is normal and have no performance overhead as such. Thanks
> again. 
>
> In regards to the below, "nested loop" having response time of 100ms. I
> understand, here the casting function us now removed after changing the data
> type of columns to match in both side of the join.
>
> So, is this expected to do a nested loop on 500k rows to take 100ms?
>
> ->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (actual time=
> 6.406..107.946 rows=1049 loops=1)
>               Join Filter: (df.ent_id = m.ent_id)
>               Rows Removed by Join Filter: 514648
>               Buffers: shared hit=1972
>

Take a closer look at what that nested loop does:

        ->  Nested Loop  (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
              Join Filter: ((df.ent_id)::numeric = m.ent_id)
              Rows Removed by Join Filter: 513436
              Buffers: shared hit=1939
              ->  Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df  (cost=0.43..115471.09 rows=1417983
width=20)(actual time=0.047..20.155 rows=43626 loops=1) 
                    Filter: ((txn_tbl_type_nm)::text = ANY ('{.......}'::text[]))
                    Rows Removed by Filter: 17
                    Buffers: shared hit=1816
              ->  Materialize  (cost=266.10..328.09 rows=58 width=16) (actual time=0.000..0.001 rows=12 loops=43626)
                  [lots of stuff]

It scans backwards through txn_tbl_due_dt_idx which returns 43626 rows
and takes 20 milliseconds.

For each of these rows it performs the "Materialize" node, which in turn
does lots of stuff, but whatever it is, it's fast and probably not worth
optimizing. The problem is that it's done 43626 times, which takes
another 120ms.

So the most promising way to proceed it to try to reduce those 43626
rows. Since the query is already scanning txn_tbl_due_dt_idx from newest
to oldest, is there a cutoff date where it is safe to ignore everything
older? If you can get it to scan only 2000 rows that would be 20 times
faster ...

(I'm a bit confused by your naming. I'm guessing that the "Index Scan
Backward using txn_tbl_due_dt_idx" is there because of the "order by
df.tran_date desc", but the name of the index and the column don't
match.)

        hjp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Top -N Query performance issue and high CPU usage

От
felix.quintgz@yahoo.com
Дата:





Have you tried adding an index to txn_tbl.txn_type?
And a vacuum on all tables? It seems the visibility map is outdated.

I'm using https://explain.dalibo.com to view the plan visually; it's more convenient.

You could use the option to periodically save the results of queries with common filters to another table, and then
retrievethe results from that table when a user performs a query with their own filters. 
You should also store the user's query results somewhere for a while to prevent excessive database access.

I imagine this is some kind of dashboard that each user is taken to after authenticating. It looks nice in
presentations,but after a while in production, it can make the system unusable. I had to remove similar charts from the
homepageof a system because after a year of work, they were taking a minute to load. 


 On Saturday, January 31, 2026 at 08:30:33 AM GMT-5, yudhi s <learnerdatabase99@gmail.com> wrote:
 Hello Experts,
 We have a "Select" query which is using three to five main transaction tables (txn_tbl, txn_status, txn_decision,
txn_sale,ath) holding ~2million rows in each of them(which is going to increase to have ~50-100million in future) and
others(6-7)tables out of which some are master and some other small tables. 

When we are running this query , and it's taking ~2-3seconds , however when we hit this query from 10-15 session at
sametime its causing CPU spike up to ~50-60% for the DB instance and this is increasing and touching 90% when we are
increasingthe hits further to 40-50 times concurrently. 

This query is going to be called in the first page of an UI screen and is supposed to show the latest 1000 rows based
ona certain transaction date. This query is supposed to allow thousands of users to hit this same query at the first
landingpage at the same time. 

Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in postgres to understand what part/line of the query
iscontributing to the high cpu time? 
2)How can we tune this query to further reduce response time and mainly CPU consumption ? Is any additional index or
anythingwill make this plan better further? 
3) Is there any guidance or best practices exists , to create/design top N-queries for such UI scenarios where
performanceis an important factor? 
4)And based on the CPU core and memory , is there any calculation by using which , we can say that this machine can
supporta maximum N number of concurrent queries of such type beyond which we need more cpu cores machines? 
Below is the query and its current plan:-https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f
RegardsYudhi



Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:


On Wed, Feb 4, 2026 at 2:32 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2026-02-04 00:20:20 +0530, yudhi s wrote:
>
>
> On Tue, 3 Feb, 2026, 9:37 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
>
>     On 2/3/26 07:59, Ron Johnson wrote:
>
>     >
>     >
>     > There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
>     > expected and normal.
>
>     What Ron is saying is that there are varchar and char types, but they
>     boil down to text per:
>
>     https://www.postgresql.org/docs/current/datatype-character.html
>
>     "text is PostgreSQL's native string data type, in that most built-in
>     functions operating on strings are declared to take or return text not
>     character varying. For many purposes, character varying acts as though
>     it were a domain over text."
>
>     As to performance see:
>
>     "
>     Tip
>
>     There is no performance difference among these three types, apart from
>     increased storage space when using the blank-padded type, and a few
>     extra CPU cycles to check the length when storing into a
>     length-constrained column. While character(n) has performance advantages
>     in some other database systems, there is no such advantage in
>     PostgreSQL; in fact character(n) is usually the slowest of the three
>     because of its additional storage costs. In most situations text or
>     character varying should be used instead.
>     "
>
>
> Thank you. I was looking into those casting(::text) in the explain plan output
> in similar way (as it was happening for int8 to numeric join scenario) and was
> thinking, may be it's spending some cpu cycles on doing these ::text casting
> behind the scenes for that column and if there is someway(data type change) to
> stop those. But from your explanation, it looks like those representation in
> the query plan is normal and have no performance overhead as such. Thanks
> again. 
>
> In regards to the below, "nested loop" having response time of 100ms. I
> understand, here the casting function us now removed after changing the data
> type of columns to match in both side of the join.
>
> So, is this expected to do a nested loop on 500k rows to take 100ms?
>
> ->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (actual time=
> 6.406..107.946 rows=1049 loops=1)
>               Join Filter: (df.ent_id = m.ent_id)
>               Rows Removed by Join Filter: 514648
>               Buffers: shared hit=1972
>

Take a closer look at what that nested loop does:

        ->  Nested Loop  (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
              Join Filter: ((df.ent_id)::numeric = m.ent_id)
              Rows Removed by Join Filter: 513436
              Buffers: shared hit=1939
              ->  Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df  (cost=0.43..115471.09 rows=1417983 width=20) (actual time=0.047..20.155 rows=43626 loops=1)
                    Filter: ((txn_tbl_type_nm)::text = ANY ('{.......}'::text[]))
                    Rows Removed by Filter: 17
                    Buffers: shared hit=1816
              ->  Materialize  (cost=266.10..328.09 rows=58 width=16) (actual time=0.000..0.001 rows=12 loops=43626)
                  [lots of stuff]

It scans backwards through txn_tbl_due_dt_idx which returns 43626 rows
and takes 20 milliseconds.

For each of these rows it performs the "Materialize" node, which in turn
does lots of stuff, but whatever it is, it's fast and probably not worth
optimizing. The problem is that it's done 43626 times, which takes
another 120ms.

So the most promising way to proceed it to try to reduce those 43626
rows. Since the query is already scanning txn_tbl_due_dt_idx from newest
to oldest, is there a cutoff date where it is safe to ignore everything
older? If you can get it to scan only 2000 rows that would be 20 times
faster ...

(I'm a bit confused by your naming. I'm guessing that the "Index Scan
Backward using txn_tbl_due_dt_idx" is there because of the "order by
df.tran_date desc", but the name of the index and the column don't
match.)


Got it. Thank you.

Yes , As folks here suggested, I created the new index on  "tran_date" which is used as "order by desc" to only show the newest 1000 rows with a "limit" operator. And this index backward scan is getting used and helping to a large extent to drop the response time as opposed to early "table sequential scan'.

Now , in this query as you said we need to see if we can further put a filter on the tran_date so as to minimize the records from table txn_tbl which would minimize the number of loops the materialize operation is happening. Need to check if that is possible without impacting business functionality. However,  Is there any way this materialized operation will happen once i.e kind of a "HASH" Join fashion (where only once it will be scanned) rather in a nested loop fashion which is currently happening ~43K times? 

Another question i had in mind as there is the filter " Filter: ((txn_tbl_type_nm)::text = ANY ('{.......}'::text[]))" , will including this column in the index i.e. making it composite (TRAN_DATE, txn_tbl_type_nm) will be a good idea. Mainly in scenarios where this txn_tbl_type_nm will filter out more rows i.e. ~100-500K + rows?

Regards
Yudhi


Re: Top -N Query performance issue and high CPU usage

От
yudhi s
Дата:

On Wed, Feb 4, 2026 at 9:18 PM <felix.quintgz@yahoo.com> wrote:

Have you tried adding an index to txn_tbl.txn_type?
And a vacuum on all tables? It seems the visibility map is outdated.

I'm using https://explain.dalibo.com to view the plan visually; it's more convenient.

You could use the option to periodically save the results of queries with common filters to another table, and then retrieve the results from that table when a user performs a query with their own filters.
You should also store the user's query results somewhere for a while to prevent excessive database access.

I imagine this is some kind of dashboard that each user is taken to after authenticating. It looks nice in presentations, but after a while in production, it can make the system unusable. I had to remove similar charts from the homepage of a system because after a year of work, they were taking a minute to load.


 On Saturday, January 31, 2026 at 08:30:33 AM GMT-5, yudhi s <learnerdatabase99@gmail.com> wrote:
 Hello Experts,
 We have a "Select" query which is using three to five main transaction tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million rows in each of them(which is going to increase to have ~50-100million in future) and others(6-7) tables out of which some are master and some other small tables.

When we are running this query , and it's taking ~2-3seconds , however when we hit this query from 10-15 session at same time its causing CPU spike up to ~50-60% for the DB instance and this is increasing and touching 90% when we are increasing the hits further to 40-50 times concurrently.

This query is going to be called in the first page of an UI screen and is supposed to show the latest 1000 rows based on a certain transaction date. This query is supposed to allow thousands of users to hit this same query at the first landing page at the same time.

Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in postgres to understand what part/line of the query is contributing to the high cpu time?
2)How can we tune this query to further reduce response time and mainly CPU consumption ? Is any additional index or anything will make this plan better further?
3) Is there any guidance or best practices exists , to create/design top N-queries for such UI scenarios where performance is an important factor?
4)And based on the CPU core and memory , is there any calculation by using which , we can say that this machine can support a maximum N number of concurrent queries of such type beyond which we need more cpu cores machines?
Below is the query and its current plan:-https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f
RegardsYudhi



As folks suggested , adding an index on "tran_date" and combining the CTE to two, and making the data type equal for the "ent_id" has helped reduce the response to a large extent. Now I am trying to see if we can reduce any further. As most of the time(100-20=~80ms) is now on materialize loop which is happening 43K times.

Also thinking if adding "txn_tbl_type_nm" column to the index i.e. composite index on (tran_date,txn_tbl_type_nm) will be advisable , in cases where , ~500K rows will be filtered  by the txn_tbl_type_nm filter criteria (currently its just 17 rows getting filtered though for this case).


-> Nested Loop (cost=263.20..1680202.56 rows=483106 width=20) (actual time=6.421..111.220 rows=1000 loops=1)
Buffers: shared hit=6168
-> Nested Loop (cost=262.77..1342550.91 rows=579149 width=20) (actual time=6.406..107.946 rows=1049 loops=1)
Join Filter: (df.ent_id = m.ent_id)
Rows Removed by Join Filter: 514648
Buffers: shared hit=1972
-> Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df (cost=0.43..115879.87 rows=1419195 width=20) (actual time=0.019..20.377 rows=43727 loops=1)
Filter: ((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[]))
Rows Removed by Filter: 17
Buffers: shared hit=1839
-> Materialize (cost=262.35..364.01 rows=58 width=8) (actual time=0.000..0.001 rows=12 loops=43727)
Buffers: shared hit=133



Regards
Yudhi

Re: Top -N Query performance issue and high CPU usage

От
Thiemo Kellner
Дата:
A nested loop is not bad per se, at least in Oracle. It depends on the data. If the number of rows participating in the join table are very unequal, the NL is the more efficient join. I would presume that every join of a fact table with a dimension table belongs to that category.