performance discrepancy indexing one column versus two columns

Поиск

Список

Период

Сортировка

От	Mark Dilger
Тема	performance discrepancy indexing one column versus two columns
Дата	11 сентября 2005 г. 07:46:55
Msg-id	4323E016.2060606@markdilger.com обсуждение исходный текст
Ответы	Re: performance discrepancy indexing one column versus
Список	pgsql-performance

Дерево обсуждения

All,

In the psql output below, I would expect the second query to run faster, because
the b-tree index on two columns knows the values of 'b' for any given value of
'a', and hence does not need to fetch a row from the actual table.  I am not
seeing a speed-up, however, so I expect my understanding of the index mechanism
is wrong.  Could anyone enlighten me?

Specifically, I would expect the first query to walk the b-tree looking for
values of 'a' equal to 1, 100001, etc., and then a dereference over to the main
table to fetch the value for column 'b'.

But I would expect the second query to walk the b-tree looking for values of 'a'
equal to 1, 100001, etc., and then find on that same page in the b-tree the
value of 'b', thereby avoiding the dereference and extra page fetch.

Is the problem that the two-column b-tree contains more data, is spread across
more disk pages, and is hence slower to access, canceling out the performance
gain of not having to fetch from the main table?  Or is the query system not
using the second column information from the index and doing the table fetch
anyway?  Or does the index store the entire row from the main table regardless
of the column being indexed?

I am running postgresql 8.0.3 on a Pentium 4 with ide hard drives and the
default configuration file settings.

Thanks in advance,

mark



mark=# create sequence test_id_seq;
CREATE SEQUENCE
mark=# create table test (a integer not null default nextval('test_id_seq'), b
integer not null);
CREATE TABLE
mark=# create function testfunc () returns void as $$
mark$# declare
mark$# i integer;
mark$# begin
mark$# for i in 1..1000000 loop
mark$# insert into test (b) values (i);
mark$# end loop;
mark$# return;
mark$# end;
mark$# $$ language plpgsql;
CREATE FUNCTION
mark=# select * from testfunc();
  testfunc
----------

(1 row)

mark=# select count(*) from test;
   count
---------
  1000000
(1 row)

mark=# create index test_single_idx on test(a);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001,
400001, 500001, 600001, 700001, 800001, 900001);

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using test_single_idx, test_single_idx, test_single_idx,
test_single_idx, test_single_idx, test_single_idx, test_single_idx,
test_single_idx, test_single_idx, test_single_idx on test  (cost=0.00..30.36
rows=10 width=4) (actual time=0.145..0.917 rows=10 loops=1)
    Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a =
400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a =
900001))
  Total runtime: 1.074 ms
(3 rows)

mark=# drop index test_single_idx;
DROP INDEX
mark=# create index test_double_idx on test(a,b);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 100001, 200001, 300001,
400001, 500001, 600001, 700001, 800001, 900001);

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using test_double_idx, test_double_idx, test_double_idx,
test_double_idx, test_double_idx, test_double_idx, test_double_idx,
test_double_idx, test_double_idx, test_double_idx on test  (cost=0.00..43.48
rows=10 width=4) (actual time=0.283..1.119 rows=10 loops=1)
    Index Cond: ((a = 1) OR (a = 100001) OR (a = 200001) OR (a = 300001) OR (a =
400001) OR (a = 500001) OR (a = 600001) OR (a = 700001) OR (a = 800001) OR(a =
900001))
  Total runtime: 1.259 ms
(3 rows)

В списке pgsql-performance по дате отправления:

Предыдущее

От: Michael Fuhr
Дата: 10 сентября 2005 г., 14:42:17
Сообщение: Re: CHECK vs REFERENCES

Следующее

От: Gavin Sherry
Дата: 11 сентября 2005 г., 08:06:11
Сообщение: Re: performance discrepancy indexing one column versus

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

performance discrepancy indexing one column versus two columns

Предыдущее

Следующее