Re: When are index scans used over seq scans?

От: Tom Lane
Тема: Re: When are index scans used over seq scans?
Дата: ,
Msg-id: 15074.1114093502@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: When are index scans used over seq scans?  (Richard van den Berg)
Ответы: Re: When are index scans used over seq scans?  (Richard van den Berg)
Список: pgsql-performance

Скрыть дерево обсуждения

When are index scans used over seq scans?  (Richard van den Berg, )
 Re: When are index scans used over seq scans?  (John A Meinel, )
  Re: When are index scans used over seq scans?  (Richard van den Berg, )
   Re: When are index scans used over seq scans?  (John A Meinel, )
    Re: When are index scans used over seq scans?  (Richard van den Berg, )
 Re: When are index scans used over seq scans?  (Tom Lane, )
  Re: When are index scans used over seq scans?  (Richard van den Berg, )
  Re: When are index scans used over seq scans?  (Richard van den Berg, )
   Re: When are index scans used over seq scans?  (Tom Lane, )
    Re: When are index scans used over seq scans?  (Richard van den Berg, )
     Re: When are index scans used over seq scans?  (Tom Lane, )
      Re: When are index scans used over seq scans?  (Richard van den Berg, )
 Re: When are index scans used over seq scans?  (Richard van den Berg, )

Richard van den Berg <> writes:
> This is with the default cpu_tuple_cost = 0.01:

>  Nested Loop  (cost=252.80..233010147.16 rows=1035480320 width=98)
> (actual time=0.369..12672213.137 rows=6171334 loops=1)
>    Join Filter: (("outer".starttimetrunc <= "inner".ts) AND
> ("outer".finishtimetrunc >= "inner".ts))
>    ->  Seq Scan on sessions us  (cost=0.00..26822.36 rows=924536
> width=106) (actual time=0.039..5447.349 rows=924536 loops=1)
>    ->  Materialize  (cost=252.80..353.60 rows=10080 width=8) (actual
> time=0.000..2.770 rows=10080 loops=924536)
>          ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080
> width=8) (actual time=0.019..13.397 rows=10080 loops=1)
>  Total runtime: 12674486.670 ms

Hmm, that *is* showing rather a spectacularly large amount of time in
the join itself: if I did the arithmetic right,

regression=# select 12672213.137 - (5447.349 + 2.770*924536 + 13.397);
   ?column?
--------------
 10105787.671
(1 row)

which is almost 80% of the entire runtime.  Which is enormous.
What are those column datatypes exactly?  Perhaps you are incurring a
datatype conversion cost?  Straight timestamp-vs-timestamp comparison
is fairly cheap, but any sort of conversion will cost dearly.

The planner's model for the time spent in the join itself is
    (cpu_tuple_cost + 2 * cpu_operator_cost) * n_tuples
(the 2 because you have 2 operators in the join condition)
so you'd have to raise one or the other of these parameters
to model this situation accurately.  But I have a hard time
believing that cpu_tuple_cost is really as high as 0.1.
It seems more likely that the cpu_operator_cost is underestimated,
which leads me to question what exactly is happening in those
comparisons.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Josh Berkus
Дата:
Сообщение: Re: Index bloat problem?
От: Litao Wu
Дата:
Сообщение: Re: index not used