Обсуждение: sequence scan on PK

От:
Jeroen van Iddekinge
Дата:

Hi,


I understand that when a table contains only a few rows it is better to
do a sequence scan than an index scan. But is this also for a table with
99 records?

A table contains
id                                          integer (primary key)
name                                    varchar(70)
parent                                   integer
comment                              text
owner                                  integer
inheritAccess                       integer
defaultAccess                      integer
sequence                              bigint
contentsinheritaccessmove   integer
contentsinheritaccessadd      integer


explain select * from tblFolders where id=90;
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on tblfolders  (cost=0.00..3.24 rows=1 width=50)
   Filter: (id = 90)


(I have analyze table bit still a sequence scan).

With how manys rows it is  ok to do an index scan or sequence scan? How
is this calculated in pg?

Regards
 Jer

От:
John A Meinel
Дата:

Jeroen van Iddekinge wrote:
> Hi,
>
>
> I understand that when a table contains only a few rows it is better to
> do a sequence scan than an index scan. But is this also for a table with
> 99 records?
>

...

> explain select * from tblFolders where id=90;
>                        QUERY PLAN
> -----------------------------------------------------------
> Seq Scan on tblfolders  (cost=0.00..3.24 rows=1 width=50)
>   Filter: (id = 90)
>
>
> (I have analyze table bit still a sequence scan).
>
> With how manys rows it is  ok to do an index scan or sequence scan? How
> is this calculated in pg?
>
> Regards
> Jer

It depends on how many pages need to be read. To do an index scan you
need to read the index pages, and then you read the page where the
actual row resides.

Usually the comment is if you are selecting >5% of the rows, seqscan is
faster than an index scan. If I'm reading your query correctly, it is
estimating needing to read about 3 pages to get the row you are asking
for. If you used an index, it probably would have to read at least that
many pages, and they would not be a sequential read, so it should be slower.

If you want to prove it, try:

\timing
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;

SET enable_seqscan TO OFF;

EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;
EXPLAIN ANALYZE SELECT * FROM tblFolders WHERE id=90;


Run multiple times to make sure everything is cached, and take the
fastest time. On your machine it might be true that the index scan is
slightly faster than the seqscan in this exact circumstance. But I have
the feeling the time is trivially different, and if you had say 70 rows
it would favor seqscan. Probably somewhere at 150-200 rows it will
switch on it's own.

You could tweak with several settings to get it to do an index scan
earlier, but these would probably break other queries. You don't need to
tune for 100 rows, more like 100k or 100M.

John
=:->

От:
Jeroen van Iddekinge
Дата:

>
> You could tweak with several settings to get it to do an index scan
> earlier, but these would probably break other queries. You don't need to
> tune for 100 rows, morelike 100k or 100M.

Thanks for respone.
The index scan was a little bit faster for id=1 and faster for id=99.

Which settings shoud I change for this?  cpu_index_tuple_cost ,
cpu_operator_cost, cpu_tuple_cost?


Jer.



От:
Matteo Beccati
Дата:

Hi,

> Thanks for respone.
> The index scan was a little bit faster for id=1 and faster for id=99.
>
> Which settings shoud I change for this?  cpu_index_tuple_cost ,
> cpu_operator_cost, cpu_tuple_cost?

You should lower random_page_cost to make the planner choose an index
scan vs sequential scan.


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

От:
John A Meinel
Дата:

Jeroen van Iddekinge wrote:
>
>>
>> You could tweak with several settings to get it to do an index scan
>> earlier, but these would probably break other queries. You don't need to
>> tune for 100 rows, morelike 100k or 100M.
>
>
> Thanks for respone.
> The index scan was a little bit faster for id=1 and faster for id=99.
>
> Which settings shoud I change for this?  cpu_index_tuple_cost ,
> cpu_operator_cost, cpu_tuple_cost?
>
>
> Jer.

Well, I would start with *don't*. You are only looking at one query,
which is pretty much fast already, and probably is not going to be the
bottleneck. You are optimizing the wrong thing.

That being said, because you have everything cached in ram (since it is
a tiny table), you probably would set random_page_cost = 2.

In theory it should really never be lower than 2, though if you are
trying to force an index scan you can do it.

John
=:->


От:
Jeroen van Iddekinge
Дата:

>
> You should lower random_page_cost to make the planner choose an index
> scan vs sequential scan.
>
Yes , it was a bit to high (18) so a lowered it. It speeded up some
pages for about 5%.

Reg. Jer

От:
Tom Lane
Дата:

Jeroen van Iddekinge <> writes:
>> You could tweak with several settings to get it to do an index scan
>> earlier, but these would probably break other queries. You don't need to
>> tune for 100 rows, morelike 100k or 100M.

> Which settings shoud I change for this?

I'd agree with John's response: if you change any settings based on just
this one test case, you're a fool.  But usually random_page_cost is the
best knob to twiddle if you wish to encourage indexscans.

            regards, tom lane

От:
Matteo Beccati
Дата:

Hi,

> Yes , it was a bit to high (18) so a lowered it. It speeded up some
> pages for about 5%.

18? The default is 4 if I can remember correctly. I wonder if your db
has ever seen an index scan ;)


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

От:
Jeroen van Iddekinge
Дата:

Matteo Beccati wrote:

> Hi,
>
>> Yes , it was a bit to high (18) so a lowered it. It speeded up some
>> pages for about 5%.
>
>
> 18? The default is 4 if I can remember correctly. I wonder if your db
> has ever seen an index scan ;)
>

I was expermenting how much some setting influence has on the perfomance
of some web application.
So I think i forgot to change the setting back and got some strange
query plans.

Thanks
Jer


От:
Grega Bremec
Дата:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:
| Jeroen van Iddekinge <> writes:
|
|>>You could tweak with several settings to get it to do an index scan
|>>earlier, but these would probably break other queries. You don't need to
|>>tune for 100 rows, morelike 100k or 100M.
|
|
|>Which settings shoud I change for this?
|
|
| I'd agree with John's response: if you change any settings based on just
| this one test case, you're a fool.  But usually random_page_cost is the
| best knob to twiddle if you wish to encourage indexscans.
|

Perhaps just a small comment - before starting the tuning process, you
want to make sure the query planner has the right ideas about the nature
of data contained in your indexed column.

Sometimes, if you insert reasonably sized batches of records containing
the same value for that column (for example in a multicolumn key where
you usually retrieve by only one column), statistics collector (used to)
get out of sync with reality with regard to cardinality of data, because
the default snapshot is too small to provide it with objective insight.
If you're intimate with your data, you probably want to increase
statistics target on that column and/or do some other statistics-related
magic and ANALYZE the table again; that alone can mean the difference
between a sequential and an index scan where appropriate, and most
importantly, you don't need to distort the database's understanding of
your hardware to achieve optimal plans (provided you have the value set
to proper values, of course), so you won't get bitten where you don't
expect it. :)

Again, this might not pertain to a 100-row table, but is a good thing
[tm] to know when optimizing. I personally would prefer to look at that
aspect of optimizer's understanding of data before anything else.

Hope this helps.

Regards,
- --
Grega Bremec
gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCfvizfu4IwuB3+XoRAhI1AJ92uhoh0u9q7/XPllH37o5KXlpJdwCfQ+2b
sJhq4ZWDdZU9x4APoGOsMes=
=Tq99
-----END PGP SIGNATURE-----