Обсуждение: Sort and index

От:
Andrei Gaspar
Дата:

Hi,

I thought that an index can be used for sorting.
I'm a little confused about the following result:

create index OperationsName on Operations(cOperationName);
explain SELECT * FROM Operations ORDER BY cOperationName;
                              QUERY PLAN
-----------------------------------------------------------------------
 Sort  (cost=185.37..189.20 rows=1532 width=498)
   Sort Key: coperationname
   ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
(3 rows)

Is this supposed to be so?

Andrei


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.15 - Release Date: 4/16/2005


От:
"Dave Held"
Дата:

> -----Original Message-----
> From: Andrei Gaspar [mailto:]
> Sent: Monday, April 18, 2005 10:36 AM
> To: 
> Subject: [PERFORM] Sort and index
>
> I thought that an index can be used for sorting.
> I'm a little confused about the following result:
>
> create index OperationsName on Operations(cOperationName);
> explain SELECT * FROM Operations ORDER BY cOperationName;
>                               QUERY PLAN
> --------------------------------------------------------------
> ---------
>  Sort  (cost=185.37..189.20 rows=1532 width=498)
>    Sort Key: coperationname
>    ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
> (3 rows)
>
> Is this supposed to be so?

Since you are fetching the entire table, you are touching all the rows.
If the query were to fetch the rows in index order, it would be seeking
all over the table's tracks.  By fetching in sequence order, it has a
much better chance of fetching rows in a way that minimizes head seeks.
Since disk I/O is generally 10-100x slower than RAM, the in-memory sort
can be surprisingly slow and still beat indexed disk access.  Of course,
this is only true if the table can fit and be sorted entirely in memory
(which, with 1500 rows, probably can).

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

От:
Andrei Gaspar
Дата:

Thanks for the quick response
Andrei

Dave Held wrote:

>>-----Original Message-----
>>From: Andrei Gaspar [mailto:]
>>Sent: Monday, April 18, 2005 10:36 AM
>>To: 
>>Subject: [PERFORM] Sort and index
>>
>>I thought that an index can be used for sorting.
>>I'm a little confused about the following result:
>>
>>create index OperationsName on Operations(cOperationName);
>>explain SELECT * FROM Operations ORDER BY cOperationName;
>>                              QUERY PLAN
>>--------------------------------------------------------------
>>---------
>> Sort  (cost=185.37..189.20 rows=1532 width=498)
>>   Sort Key: coperationname
>>   ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
>>(3 rows)
>>
>>Is this supposed to be so?
>>
>>
>
>Since you are fetching the entire table, you are touching all the rows.
>If the query were to fetch the rows in index order, it would be seeking
>all over the table's tracks.  By fetching in sequence order, it has a
>much better chance of fetching rows in a way that minimizes head seeks.
>Since disk I/O is generally 10-100x slower than RAM, the in-memory sort
>can be surprisingly slow and still beat indexed disk access.  Of course,
>this is only true if the table can fit and be sorted entirely in memory
>(which, with 1500 rows, probably can).
>
>__
>David B. Held
>Software Engineer/Array Services Group
>200 14th Ave. East,  Sartell, MN 56377
>320.534.3637 320.253.7800 800.752.8129
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>
>
>


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.15 - Release Date: 4/16/2005


От:
Michael Fuhr
Дата:

On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote:
> >
> > I thought that an index can be used for sorting.
> > I'm a little confused about the following result:
> >
> > create index OperationsName on Operations(cOperationName);
> > explain SELECT * FROM Operations ORDER BY cOperationName;
> >                               QUERY PLAN
> > --------------------------------------------------------------
> > ---------
> >  Sort  (cost=185.37..189.20 rows=1532 width=498)
> >    Sort Key: coperationname
> >    ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
> > (3 rows)
> >
> > Is this supposed to be so?
>
> Since you are fetching the entire table, you are touching all the rows.
> If the query were to fetch the rows in index order, it would be seeking
> all over the table's tracks.  By fetching in sequence order, it has a
> much better chance of fetching rows in a way that minimizes head seeks.
> Since disk I/O is generally 10-100x slower than RAM, the in-memory sort
> can be surprisingly slow and still beat indexed disk access.  Of course,
> this is only true if the table can fit and be sorted entirely in memory
> (which, with 1500 rows, probably can).

Out of curiosity, what are the results of the following queries?
(Queries run twice to make sure time differences aren't due to
caching.)

SET enable_seqscan TO on;
SET enable_indexscan TO off;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;

SET enable_seqscan TO off;
SET enable_indexscan TO on;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;

SELECT version();

With 1500 rows of random data, I consistently see better performance
with an index scan (about twice as fast as a sequence scan), and
the planner uses an index scan if it has a choice (i.e., when
enable_seqscan and enable_indexscan are both on).  But my test case
and postgresql.conf settings might be different enough from yours
to account for different behavior.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

От:
"Jim C. Nasby"
Дата:

On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote:
> Since you are fetching the entire table, you are touching all the rows.
> If the query were to fetch the rows in index order, it would be seeking
> all over the table's tracks.  By fetching in sequence order, it has a
> much better chance of fetching rows in a way that minimizes head seeks.
> Since disk I/O is generally 10-100x slower than RAM, the in-memory sort
> can be surprisingly slow and still beat indexed disk access.  Of course,
> this is only true if the table can fit and be sorted entirely in memory
> (which, with 1500 rows, probably can).

Actually, the planner (at least in 7.4) isn't smart enough to consider
if the sort would fit in memory or not. I'm running a test right now to
see if it's actually faster to use an index in this case.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Tom Lane
Дата:

"Jim C. Nasby" <> writes:
> Actually, the planner (at least in 7.4) isn't smart enough to consider
> if the sort would fit in memory or not.

Really?  Have you read cost_sort()?

It's certainly possible that the calculation is all wet, but to claim
that the issue is not considered is just wrong.

            regards, tom lane

От:
"Jim C. Nasby"
Дата:

On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <> writes:
> > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > if the sort would fit in memory or not.
>
> Really?  Have you read cost_sort()?
>
> It's certainly possible that the calculation is all wet, but to claim
> that the issue is not considered is just wrong.

To be fair, no, I haven't looked at the code. This is based strictly on
anecdotal evidence on a 120M row table. I'm currently running a test to
see how an index scan compares to a seqscan. I also got the same results
when I added a where clause that would restrict it to about 7% of the
table.

Actually, after running some tests (below), the plan cost does change
when I change sort_mem (it was originally 50000).

stats=# \d email_contrib
   Table "public.email_contrib"
   Column   |  Type   | Modifiers
------------+---------+-----------
 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer |
 work_units | bigint  | not null
Indexes:
    "email_contrib_pkey" primary key, btree (project_id, id, date)
    "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
    "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
    "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
    "email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
    "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
    "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
         Filter: (project_id = 8)
(4 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=# select 8862263::float/130969400;
      ?column?
--------------------
 0.0676666687027657
(1 row)

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
   Index Cond: (project_id = 8)
(2 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
(1 row)

stats=# set enable_seqscan=on;
SET
stats=# set sort_mem=1000;
SET
stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=#

--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Andrei Gaspar
Дата:

Michael Fuhr wrote:

>On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote:
>
>
>>>I thought that an index can be used for sorting.
>>>I'm a little confused about the following result:
>>>
>>>create index OperationsName on Operations(cOperationName);
>>>explain SELECT * FROM Operations ORDER BY cOperationName;
>>>                              QUERY PLAN
>>>--------------------------------------------------------------
>>>---------
>>> Sort  (cost=185.37..189.20 rows=1532 width=498)
>>>   Sort Key: coperationname
>>>   ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
>>>(3 rows)
>>>
>>>Is this supposed to be so?
>>>
>>>
>>Since you are fetching the entire table, you are touching all the rows.
>>If the query were to fetch the rows in index order, it would be seeking
>>all over the table's tracks.  By fetching in sequence order, it has a
>>much better chance of fetching rows in a way that minimizes head seeks.
>>Since disk I/O is generally 10-100x slower than RAM, the in-memory sort
>>can be surprisingly slow and still beat indexed disk access.  Of course,
>>this is only true if the table can fit and be sorted entirely in memory
>>(which, with 1500 rows, probably can).
>>
>>
>
>Out of curiosity, what are the results of the following queries?
>(Queries run twice to make sure time differences aren't due to
>caching.)
>
>SET enable_seqscan TO on;
>SET enable_indexscan TO off;
>EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
>EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
>
>SET enable_seqscan TO off;
>SET enable_indexscan TO on;
>EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
>EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
>
>SELECT version();
>
>With 1500 rows of random data, I consistently see better performance
>with an index scan (about twice as fast as a sequence scan), and
>the planner uses an index scan if it has a choice (i.e., when
>enable_seqscan and enable_indexscan are both on).  But my test case
>and postgresql.conf settings might be different enough from yours
>to account for different behavior.
>
>
>
Here is the output from the statements above. I know the times seem too
small to care, but what triggered my question is the fact that in the
logs there are a lot of lines like (i replaced the list of 43 fields
with *).
I use ODBC (8.0.1.1) and to change the application to cache the table
isn't feasible.

2005-04-19 10:07:05 LOG:  duration: 937.000 ms  statement: PREPARE
"_PLAN35b0068" as SELECT * FROM Operations ORDER BY
cOperationName;EXECUTE "_PLAN35b0068"
2005-04-19 10:07:09 LOG:  duration: 1344.000 ms  statement: PREPARE
"_PLAN35b0068" as SELECT * FROM Operations ORDER BY
cOperationName;EXECUTE "_PLAN35b0068"
2005-04-19 10:07:15 LOG:  duration: 1031.000 ms  statement: PREPARE
"_PLAN35b0068" as SELECT * FROM Operations ORDER BY
cOperationName;EXECUTE "_PLAN35b0068"
2005-04-19 10:07:19 LOG:  duration: 734.000 ms  statement: PREPARE
"_PLAN35b0068" as SELECT * FROM Operations ORDER BY
cOperationName;EXECUTE "_PLAN35b0068"

The times reported by explain analyze are so small though, the intervals
reported in pg_log are more real,


tkp=# SET enable_seqscan TO on;
SET
tkp=# SET enable_indexscan TO off;
SET
tkp=# EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=185.37..189.20 rows=1532 width=498) (actual
time=235.000..235.000 rows=1532 loops=1)
   Sort Key: coperationname
   ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
(actual time=0.000..124.000 rows=1532 loops=1)
 Total runtime: 267.000 ms
(4 rows)

tkp=# EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Sort  (cost=185.37..189.20 rows=1532 width=498) (actual
time=16.000..16.000 rows=1532 loops=1)
   Sort Key: coperationname
   ->  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
(actual time=0.000..0.000 rows=1532 loops=1)
 Total runtime: 31.000 ms
(4 rows)

tkp=#
tkp=# SET enable_seqscan TO off;
SET
tkp=# SET enable_indexscan TO on;
SET
tkp=# EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
                                                              QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using operationsname on operations  (cost=0.00..350.01
rows=1532 width=498) (actual time=16.000..62.000 rows=1532 loops=1)
 Total runtime: 62.000 ms
(2 rows)

tkp=# EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
                                                              QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using operationsname on operations  (cost=0.00..350.01
rows=1532 width=498) (actual time=0.000..16.000 rows=1532 loops=1)
 Total runtime: 16.000 ms
(2 rows)

tkp=#
tkp=# SELECT version();
                                         version
------------------------------------------------------------------------------------------
 PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)
(1 row)



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 4/19/2005


От:
"Jim C. Nasby"
Дата:

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.

Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
> On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <> writes:
> > > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > > if the sort would fit in memory or not.
> >
> > Really?  Have you read cost_sort()?
> >
> > It's certainly possible that the calculation is all wet, but to claim
> > that the issue is not considered is just wrong.
>
> To be fair, no, I haven't looked at the code. This is based strictly on
> anecdotal evidence on a 120M row table. I'm currently running a test to
> see how an index scan compares to a seqscan. I also got the same results
> when I added a where clause that would restrict it to about 7% of the
> table.
>
> Actually, after running some tests (below), the plan cost does change
> when I change sort_mem (it was originally 50000).
>
> stats=# \d email_contrib
>    Table "public.email_contrib"
>    Column   |  Type   | Modifiers
> ------------+---------+-----------
>  project_id | integer | not null
>  id         | integer | not null
>  date       | date    | not null
>  team_id    | integer |
>  work_units | bigint  | not null
> Indexes:
>     "email_contrib_pkey" primary key, btree (project_id, id, date)
>     "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
>     "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
>     "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
>     "email_contrib__project_date" btree (project_id, date)
> Foreign-key constraints:
>     "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
>     "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
>          Filter: (project_id = 8)
> (4 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=# select 8862263::float/130969400;
>       ?column?
> --------------------
>  0.0676666687027657
> (1 row)
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
>    Index Cond: (project_id = 8)
> (2 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
> (1 row)
>
> stats=# set enable_seqscan=on;
> SET
> stats=# set sort_mem=1000;
> SET
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=#
>
> --
> Jim C. Nasby, Database Consultant               
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to )
>

--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Tom Lane
Дата:

"Jim C. Nasby" <> writes:
> I've run some performance tests. The actual test case is at
> http://stats.distributed.net/~decibel/timing.sql, and the results are at
> http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
> an index scan appears to be about 2x faster than a sequential scan and a
> sort.

... for one test case, on one platform, with a pretty strong bias to the
fully-cached state since you ran the test multiple times consecutively.

Past experience has generally been that an explicit sort is quicker,
so you'll have to pardon me for suspecting that this case may be
atypical.  Is the table nearly in order by pkey, by any chance?

> In any case, it's clear that the planner is making the wrong choice
> here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

Feel free to propose better cost equations.

            regards, tom lane

От:
"Jim C. Nasby"
Дата:

On Fri, Apr 22, 2005 at 10:08:06PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <> writes:
> > I've run some performance tests. The actual test case is at
> > http://stats.distributed.net/~decibel/timing.sql, and the results are at
> > http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
> > an index scan appears to be about 2x faster than a sequential scan and a
> > sort.
>
> ... for one test case, on one platform, with a pretty strong bias to the
> fully-cached state since you ran the test multiple times consecutively.

The table is 6.5G and the box only has 4G, so I suspect it's not cached.

> Past experience has generally been that an explicit sort is quicker,
> so you'll have to pardon me for suspecting that this case may be
> atypical.  Is the table nearly in order by pkey, by any chance?

It might be, but there's no way I can check with a multi-key index,
right?

I'll re-run the tests with a single column index on a column with a
correlation of 16%

> > In any case, it's clear that the planner is making the wrong choice
> > here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.
>
> Feel free to propose better cost equations.

Where would I look in code to see what's used now?
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Tom Lane
Дата:

"Jim C. Nasby" <> writes:
>> Feel free to propose better cost equations.

> Where would I look in code to see what's used now?

All the gold is hidden in src/backend/optimizer/path/costsize.c.

            regards, tom lane

От:
"Jim C. Nasby"
Дата:

On Sat, Apr 23, 2005 at 01:00:40AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <> writes:
> >> Feel free to propose better cost equations.
>
> > Where would I look in code to see what's used now?
>
> All the gold is hidden in src/backend/optimizer/path/costsize.c.
>
>             regards, tom lane

After setting up a second test that orders the table by a highly
non-correlated column, I think I've found part of the problem. The
estimated index scan cost for (project_id, id, date) is
0.00..100117429.34 while the estimate for work_units is
0.00..103168408.62; almost no difference, even though project_id
correlation is .657 while work_units correlation is .116. This is with
random_page_cost set to 1.1; if I set it much higher I can't force the
index scan (BTW, would it make more sense to set the cost of a disable
seqscan to either pages or tuples * disable_cost?), but even with only a
10% overhead on random page fetches it seems logical that the two
estimates should be much farther apart. If you look at the results of
the initial run (http://stats.distributed.net/~decibel/timing.log),
you'll see that the cost of the index scan is way overestimated. Looking
at the code, the runcost is calculated as

    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

where csquared is indexCorrelation^2. Why is indexCorrelation squared?
The comments say a linear interpolation between min_IO and max_IO is
used, but ISTM that if it was linear then instead of csquared,
indexCorrelation would just be used.

By the way, I'm running a test for ordering by work_units right now, and
I included code to allocate and zero 3.3G of memory (out of 4G) between
steps to clear the kernel buffers. This brought the seqscan times up to
~6800 seconds, so it seems there was in fact buffering going on in the
first test. The second test has been running an index scan for over 14
hours now, so clearly a seqscan+sort is the way to go for a highly
uncorrelated index (at least one that won't fit in
effective_cache_size).
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Manfred Koizar
Дата:

On Sun, 24 Apr 2005 17:01:46 -0500, "Jim C. Nasby" <>
wrote:
>> >> Feel free to propose better cost equations.

I did.  More than once.

>estimated index scan cost for (project_id, id, date) is
>0.00..100117429.34 while the estimate for work_units is
>0.00..103168408.62; almost no difference,

~3%

> even though project_id correlation is .657

This is divided by the number of index columns, so the index correlation
is estimated to be 0.219.

> while work_units correlation is .116.

So csquared is 0.048 and 0.013, respectively, and you get a result not
far away from the upper bound in both cases.  The cost estimations
differ by only 3.5% of (max_IO_cost - min_IO_cost).

>you'll see that the cost of the index scan is way overestimated. Looking
>at the code, the runcost is calculated as
>
>    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
>
>where csquared is indexCorrelation^2. Why is indexCorrelation squared?
>The comments say a linear interpolation between min_IO and max_IO is
>used, but ISTM that if it was linear then instead of csquared,
>indexCorrelation would just be used.

In my tests I got much more plausible results with

    1 - (1 - abs(correlation))^2

Jim, are you willing to experiment with one or two small patches of
mine?  What version of Postgres are you running?

Servus
 Manfred

От:
"Jim C. Nasby"
Дата:

First, I've got some updated numbers up at
http://stats.distributed.net/~decibel/

timing2.log shows that the planner actually under-estimates an index
scan by several orders of magnitude. Granted, random_page_cost is set to
an unrealistic 1.1 (otherwise I can't force the index scan), but that
alone isn't enough to explain the difference.

On Wed, May 11, 2005 at 05:59:10PM +0200, Manfred Koizar wrote:
> On Sun, 24 Apr 2005 17:01:46 -0500, "Jim C. Nasby" <>
> wrote:
> >> >> Feel free to propose better cost equations.
>
> I did.  More than once.
>
> >estimated index scan cost for (project_id, id, date) is
> >0.00..100117429.34 while the estimate for work_units is
> >0.00..103168408.62; almost no difference,
>
> ~3%
>
> > even though project_id correlation is .657
>
> This is divided by the number of index columns, so the index correlation
> is estimated to be 0.219.

That seems like a pretty bad assumption to make.

Is there any eta on having statistics for multi-column indexes?

> >you'll see that the cost of the index scan is way overestimated. Looking
> >at the code, the runcost is calculated as
> >
> >    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
> >
> >where csquared is indexCorrelation^2. Why is indexCorrelation squared?
> >The comments say a linear interpolation between min_IO and max_IO is
> >used, but ISTM that if it was linear then instead of csquared,
> >indexCorrelation would just be used.
>
> In my tests I got much more plausible results with
>
>     1 - (1 - abs(correlation))^2

What's the theory behind that?

And I'd still like to know why correlation squared is used.

> Jim, are you willing to experiment with one or two small patches of
> mine?  What version of Postgres are you running?

It depends on the patches, since this is a production machine. Currently
it's running 7.4.*mumble*, though I need to upgrade to 8, which I was
intending to do via slony. Perhaps the best thing would be for me to get
that setup and we can experiment against version 8.0.3.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Manfred Koizar
Дата:

On Wed, 11 May 2005 16:15:16 -0500, "Jim C. Nasby" <>
wrote:
>> This is divided by the number of index columns, so the index correlation
>> is estimated to be 0.219.
>
>That seems like a pretty bad assumption to make.

Any assumption we make without looking at entire index tuples has to be
bad.  A new GUC variable secondary_correlation introduced by my patch at
least gives you a chance to manually control the effects of additional
index columns.

>> In my tests I got much more plausible results with
>>
>>     1 - (1 - abs(correlation))^2
>
>What's the theory behind that?

The same as for csquared -- pure intuition.  But the numbers presented
in http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php
seem to imply that in this case my intiution is better ;-)

Actually above formula was not proposed in that mail.  AFAIR it gives
results between p2 and p3.

>And I'd still like to know why correlation squared is used.

On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane <> wrote:
|The indexCorrelation^2 algorithm was only a quick hack with no theory
|behind it :-(.

>It depends on the patches, since this is a production machine. Currently
>it's running 7.4.*mumble*,

The patch referenced in
http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is
still available.  It doesn't touch too many places and should be easy to
review.  I'm using it and its predecessors in production for more than
two years.  Let me know, if the 74b1 version does not apply cleanly to
your source tree.

Servus
 Manfred

От:
"Jim C. Nasby"
Дата:

On Thu, May 12, 2005 at 08:54:48PM +0200, Manfred Koizar wrote:
> On Wed, 11 May 2005 16:15:16 -0500, "Jim C. Nasby" <>
> wrote:
> >> This is divided by the number of index columns, so the index correlation
> >> is estimated to be 0.219.
> >
> >That seems like a pretty bad assumption to make.
>
> Any assumption we make without looking at entire index tuples has to be
> bad.  A new GUC variable secondary_correlation introduced by my patch at
> least gives you a chance to manually control the effects of additional
> index columns.

It seems it would be much better to gather statistics on any
multi-column indexes, but I know that's probably beyond what's
reasonable for your patch.

Also, my data (http://stats.distributed.net/~decibel) indicates that
max_io isn't high enough. Look specifically at timing2.log compared to
timing.log. Thouggh, it is possibile that this is because of having
random_page_cost set to 1.1 (if I set it much higher I can't force the
index scan because the index estimate actually exceeds the cost of the
seqscan with the disable cost added in).

> >It depends on the patches, since this is a production machine. Currently
> >it's running 7.4.*mumble*,
>
> The patch referenced in
> http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is
> still available.  It doesn't touch too many places and should be easy to
> review.  I'm using it and its predecessors in production for more than
> two years.  Let me know, if the 74b1 version does not apply cleanly to
> your source tree.

Looks reasonable; I'll give it a shot on 8.0 once I have replication
happening.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"