Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query
Дата
Msg-id 20200815005532.GL4561@telsasoft.com
обсуждение исходный текст
Ответ на Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query  (Ken Tanzer <ken.tanzer@gmail.com>)
Список pgsql-performance
On Fri, Aug 14, 2020 at 03:40:40PM -0700, Ken Tanzer wrote:
> On Fri, Aug 14, 2020 at 3:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Due to the over-estimated rowcount, the planner believes that (more) rows
> > will be output (sooner) than they actually are:
> >
> >    ->  Nested Loop Semi Join  (cost=47.11..31367302.81 ROWS=611586
> > width=336) (actual time=47.098..97236.123 ROWS=25 loops=1)
> >
> > So it thinks there's something to be saved/gained by using a plan that has a
> > low startup cost.  But instead, it ends up running for a substantial fraction
> > of the total (estimated) cost.
>
> Got it.  Is there any way to address this other than re-writing the
> query?  (Statistics? Or something else?)

A usual trick is to change to write something like:
|ORDER BY added_at + '0 seconds'::interval"
which means an index scan on added_at doesn't match the ORDER BY exactly (the
planner isn't smart enough to know better).

You could try to address the misestimate, which is probably a good idea anyway.

Or make it believe that it's going to work harder to return those 25 rows.
Maybe you could change the "25" to a bind parameter, like LIMIT $N, or however
the ORM wants it.

You could change the index to a BRIN index, which doesn't help with ORDER BY
(but will also affect other queries which want to ORDER BY).

Maybe you could add an index on this expression and ANALYZE the table.  I think
it might help the estimate.  Or it might totally change the shape of the plan,
like allowing indexonly scan (which would probably require VACUUM)..
       Group Key: CASE WHEN (tbl_reference.to_table = 'client'::name) THEN tbl_reference.from_id WHEN
(tbl_reference.from_table= 'client'::name) THEN tbl_reference.to_id ELSE NULL::integer END
 

I know you didn't want to rewrite the query, but it looks to me like adjusting
the schema or query might be desirable.

agency=> EXPLAIN (ANALYZE,VERBOSE,BUFFERS,TIMING) SELECT * FROM Log WHERE log_id IN (SELECT CASE WHEN to_table='client'
THENfrom_id WHEN from_table='client' THEN to_id END FROM reference WHERE ((from_id_field = E'client_id'
 
        AND from_id =  E'34918'
        AND from_table =  E'client'
        AND to_table =  E'log'
        )
        OR  (to_id_field =  E'client_id'
        AND to_id =  E'34918'
        AND to_table =  E'client'
        AND from_table =  E'log'
        ))) ORDER BY added_at DESC;

To me that smells like a UNION (maybe not UNION ALL):
SELECT FROM log WHERE EXISTS (SELECT 1 FROM reference ref WHERE log.log_id=ref.from_id AND to_table='client' AND
from_id_field='client_id'AND from_id=$1 AND from_table='client' AND to_table='log')
 
UNION
SELECT FROM log WHERE EXISTS (SELECT 1 FROM reference ref WHERE log.log_id=ref.to_id AND from_table='client' AND
to_id_field='client_id'AND to_id=$1 AND to_table='client' AND from_table='log')
 

I guess you might know that various indexes are redundant:

    "index_tbl_log_log_type_code" btree (log_type_code)
    "tbl_log_log_type_code" btree (log_type_code)
    "tbl_log_test2" btree (log_type_code, added_at)

    "tbl_log_added_at" btree (added_at)
    "tbl_log_test" btree (added_at, log_type_code)

    "index_tbl_reference_to_table" btree (to_table)
    "index_tbl_reference_to_table_id" btree (to_table, to_id)

    "index_tbl_reference_is_deleted" btree (is_deleted)
=> Maybe that would be better as a WHERE NOT is_deleted clause on various indexes (?)

-- 
Justin



В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Query takes way longer with LIMIT, and EXPLAIN takes way longer than actual query
Следующее
От: Satyam Shekhar
Дата:
Сообщение: Replication lag due to lagging restart_lsn