От: Jim C. Nasby
Тема: Re: Sort and index
Дата: ,
Msg-id: 20050423015404.GV58835@decibel.org
(см: обсуждение, исходный текст)
Ответ на: Re: Sort and index  ("Jim C. Nasby")
Ответы: Re: Sort and index  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

Sort and index  (Andrei Gaspar, )
 Re: Sort and index  ("Dave Held", )
  Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  (Michael Fuhr, )
   Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  ("Jim C. Nasby", )
   Re: Sort and index  (Tom Lane, )
    Re: Sort and index  ("Jim C. Nasby", )
     Re: Sort and index  ("Jim C. Nasby", )
      Re: Sort and index  (Tom Lane, )
       Re: Sort and index  ("Jim C. Nasby", )
        Re: Sort and index  (Tom Lane, )
         Re: Sort and index  ("Jim C. Nasby", )
          Re: Sort and index  (Manfred Koizar, )
           Re: Sort and index  ("Jim C. Nasby", )
            Re: Sort and index  (Manfred Koizar, )
             Re: Sort and index  ("Jim C. Nasby", )

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.

Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
> On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <> writes:
> > > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > > if the sort would fit in memory or not.
> >
> > Really?  Have you read cost_sort()?
> >
> > It's certainly possible that the calculation is all wet, but to claim
> > that the issue is not considered is just wrong.
>
> To be fair, no, I haven't looked at the code. This is based strictly on
> anecdotal evidence on a 120M row table. I'm currently running a test to
> see how an index scan compares to a seqscan. I also got the same results
> when I added a where clause that would restrict it to about 7% of the
> table.
>
> Actually, after running some tests (below), the plan cost does change
> when I change sort_mem (it was originally 50000).
>
> stats=# \d email_contrib
>    Table "public.email_contrib"
>    Column   |  Type   | Modifiers
> ------------+---------+-----------
>  project_id | integer | not null
>  id         | integer | not null
>  date       | date    | not null
>  team_id    | integer |
>  work_units | bigint  | not null
> Indexes:
>     "email_contrib_pkey" primary key, btree (project_id, id, date)
>     "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
>     "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
>     "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
>     "email_contrib__project_date" btree (project_id, date)
> Foreign-key constraints:
>     "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
>     "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
>          Filter: (project_id = 8)
> (4 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=# select 8862263::float/130969400;
>       ?column?
> --------------------
>  0.0676666687027657
> (1 row)
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
>    Index Cond: (project_id = 8)
> (2 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
> (1 row)
>
> stats=# set enable_seqscan=on;
> SET
> stats=# set sort_mem=1000;
> SET
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=#
>
> --
> Jim C. Nasby, Database Consultant               
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to )
>

--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


В списке pgsql-performance по дате сообщения:

От: Kevin Brown
Дата:
Сообщение: Re: Joel's Performance Issues WAS : Opteron vs Xeon
От: Greg Stark
Дата:
Сообщение: Re: Bad n_distinct estimation; hacks suggested?