От: Jim C. Nasby
Тема: Re: Sort and index
Дата: ,
Msg-id: 20050420034041.GC58835@decibel.org
(см: обсуждение, исходный текст)
Ответ на: Re: Sort and index  (Tom Lane)
Ответы: Re: Sort and index  ("Jim C. Nasby")
Список: pgsql-performance

Скрыть дерево обсуждения

Sort and index  (Andrei Gaspar, )
 Re: Sort and index  ("Dave Held", )
  Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  (Michael Fuhr, )
   Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  ("Jim C. Nasby", )
   Re: Sort and index  (Tom Lane, )
    Re: Sort and index  ("Jim C. Nasby", )
     Re: Sort and index  ("Jim C. Nasby", )
      Re: Sort and index  (Tom Lane, )
       Re: Sort and index  ("Jim C. Nasby", )
        Re: Sort and index  (Tom Lane, )
         Re: Sort and index  ("Jim C. Nasby", )
          Re: Sort and index  (Manfred Koizar, )
           Re: Sort and index  ("Jim C. Nasby", )
            Re: Sort and index  (Manfred Koizar, )
             Re: Sort and index  ("Jim C. Nasby", )

On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <> writes:
> > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > if the sort would fit in memory or not.
>
> Really?  Have you read cost_sort()?
>
> It's certainly possible that the calculation is all wet, but to claim
> that the issue is not considered is just wrong.

To be fair, no, I haven't looked at the code. This is based strictly on
anecdotal evidence on a 120M row table. I'm currently running a test to
see how an index scan compares to a seqscan. I also got the same results
when I added a where clause that would restrict it to about 7% of the
table.

Actually, after running some tests (below), the plan cost does change
when I change sort_mem (it was originally 50000).

stats=# \d email_contrib
   Table "public.email_contrib"
   Column   |  Type   | Modifiers
------------+---------+-----------
 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer |
 work_units | bigint  | not null
Indexes:
    "email_contrib_pkey" primary key, btree (project_id, id, date)
    "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
    "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
    "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
    "email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
    "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
    "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
         Filter: (project_id = 8)
(4 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=# select 8862263::float/130969400;
      ?column?
--------------------
 0.0676666687027657
(1 row)

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
   Index Cond: (project_id = 8)
(2 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
(1 row)

stats=# set enable_seqscan=on;
SET
stats=# set sort_mem=1000;
SET
stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=#

--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


В списке pgsql-performance по дате сообщения:

От: Dawid Kuroczko
Дата:
Сообщение: Re: How to improve db performance with $7K?
От: Richard van den Berg
Дата:
Сообщение: When are index scans used over seq scans?