Обсуждение: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)

Поиск
Список
Период
Сортировка
Hi,

I have a complex query involving over 15 joins and a CTE query and it takes over 17s to complete. The output of EXPLAIN ANALYZE includes (somewhere deep inside):

Index Scan using document_pkey on document document0_  (cost=0.29..8.31 rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1)

This shows an index scan with a very small cost but a very large actual time. The strange thing is, all the tables have just been analyzed with the ANALYZE command (it is not a foreign table). Furthermore, if I run a simple query using that index, both the cost and the actual time are small.

Another snippet is:
                                                                                                                                                        -> CTE Scan on all_related_document p  (cost=1815513.32..3030511.77 rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1) 

I think the cost-actual time discrepancy is fine as it is a recursive CTE so postgresql can't estimate the cost well. It is materialized and a full table scan is performed. However, the actual time is not that bad.  Also, the estimated rows and the actual rows are also vastly different, but I guess this is fine, isn't it? 

Any idea how I should check further?

Many thanks in advance

--
Kent Tong
IT author and consultant, child education coach
On 5/8/23 07:29, Kent Tong wrote:
Hi,

I have a complex query involving over 15 joins and a CTE query and it takes over 17s to complete. The output of EXPLAIN ANALYZE includes (somewhere deep inside):

Index Scan using document_pkey on document document0_  (cost=0.29..8.31 rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1)

This shows an index scan with a very small cost but a very large actual time. The strange thing is, all the tables have just been analyzed with the ANALYZE command (it is not a foreign table). Furthermore, if I run a simple query using that index, both the cost and the actual time are small.

Another snippet is:
                                                                                                                                                        -> CTE Scan on all_related_document p  (cost=1815513.32..3030511.77 rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1) 

I think the cost-actual time discrepancy is fine as it is a recursive CTE so postgresql can't estimate the cost well. It is materialized and a full table scan is performed. However, the actual time is not that bad.  Also, the estimated rows and the actual rows are also vastly different, but I guess this is fine, isn't it? 

Any idea how I should check further?

ANALYZE just samples the table.  If data within the relevant indexed columns aren't evenly distributed, then the statistics might not show the true data distribution.

--
Born in Arizona, moved to Babylonia.
Kent Tong <kent.tong.mo@gmail.com> writes:
> I have a complex query involving over 15 joins and a CTE query and it takes
> over 17s to complete.

If you want constructive comments about that, you're going to need to show
us the whole thing (and the underlying tables' schemas).  Little bits in
isolation are not very intelligible.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

One thought is that you might need to raise from_collapse_limit and/or
join_collapse_limit to at least 15 to get the planner to do a complete
search of the join-order space.

            regards, tom lane



On Mon, May 8, 2023 at 8:29 AM Kent Tong <kent.tong.mo@gmail.com> wrote:
Hi,

I have a complex query involving over 15 joins and a CTE query and it takes over 17s to complete. The output of EXPLAIN ANALYZE includes (somewhere deep inside):

Index Scan using document_pkey on document document0_  (cost=0.29..8.31 rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1)

This shows an index scan with a very small cost but a very large actual time. The strange thing is, all the tables have just been analyzed with the ANALYZE command (it is not a foreign table). Furthermore, if I run a simple query using that index, both the cost and the actual time are small.

Another snippet is:
                                                                                                                                                        -> CTE Scan on all_related_document p  (cost=1815513.32..3030511.77 rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1) 

I think the cost-actual time discrepancy is fine as it is a recursive CTE so postgresql can't estimate the cost well. It is materialized and a full table scan is performed. However, the actual time is not that bad.  Also, the estimated rows and the actual rows are also vastly different, but I guess this is fine, isn't it? 

Any idea how I should check further?

Many thanks in advance

--
Kent Tong
IT author and consultant, child education coach

Kent,
  I had a really slow CTE based query a while back.  The problem went away when I materialized the query.
In our case, we had carried this query from another big DB, and used the CTE query to force it to be evaluated once.
It is known to be slow, and about 1% of the size of the other tables we were operating on.

  PG, by default, optimized this, and inlined it as a subquery.  Causing a massive slowdown.  Materialize fixed it right up.
That would be a quick test. 

  Outside of that, I agree with Tom... it's really hard to help without full details.

>>> FWIW, if I had my way, THIS would work:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, RELATIONS_DDL, SHOW_QUERY) <query>;

Would Dump the table structures of all involved tables/views/indexes.  It would repeat the query.
And in a perfect world, it would show you the "rewritten" query (now I am dreaming).
All with the plan...

But it will take a while to get the level of detail to come out...  But the AIs/ML will go crazy with it!

This way you just copy that output and share it.  (Eventually...)

Anyways, in the meantime, a query and the table structure/row counts would be nice.

Kirk...