Обсуждение: A question about indexes...
Hello, I have the following tables in my db Painter (id integer, uri varchar(256)) and paints (id1 integer, id2 integer) I want to optimize the question select id from Painter where uri = 'xxxxx';What kind of index (Btree or Hash) is more efficientto create on field uri since it's a string? I also want to optimize the join between the ables Painter and paints on the fields id and id1 respectively? I can either define the field id as a Primary Key or create an Btree index on it. What is more effient?? From my test I see that creating Btree index is a bit faster!!. Would the performance (of the join) be improved if I created indexes both on field id and id1 or it's sufficient to create one of the two indexes ? As far as I can see the performance is improved if I have a Primary Key on Painter.id and a BTree index on paints.id1. However when I create a Btree index on Painter.id and a BTree index on paints.id1 performance gets worst. thank you in advance for your help Sofia Alexaki alexaki@ics.forth.gr
Alexaki Sofia <alexaki@ics.forth.gr> writes:
> I can either define the field id as a Primary Key or create an Btree index
> on it. What is more effient??
> From my test I see that creating Btree index is a bit faster!!.
I think you're seeing things. Declaring a field primary key creates
a btree index on it (and also enables UNIQUE and NOT NULL checks, but
those don't affect the speed of lookups). There isn't going to be
any difference between the two ways of doing it --- whatever difference
you measured was due to other factors, eg, disk pages already in cache.
As for your other point I'd generally recommend btree over hash indexes.
The btree code is much more thoroughly tested, supports concurrent
updates which hash indexes don't, and allows order-based index scans
which hash doesn't. I don't see any redeeming social value in a hash
index, actually...
regards, tom lane
Hello, I have the following tables in my database Painter(id integer, uri varchar(256)) paints(id1 integer, id2 integer) in order to speed up the join (select * from painter, paints where painter.id= paints.id1) between these two tables I have created indexes on the field painter.id and/or paints.id1. But as I see from the query plan the indexes are not used, instead sequential search is done either I define indexes or not. As you can see below the query plan remains the same. Is that reasonable??? Shouldn't Postgresql use the indexes in order to optimize question???I can't see why is better to make sequential search since the size of tables is relatively big. A) No indexes are defined on the tables Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter1 (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) B1) BTree index on painter.id Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) B2) Primary Key on painter.id Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) C1) BTree index on painter.id and Btree on paints.id1 Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) C2) Primary Key on painter.id and Btree on paints.id1 Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) Regards, Sofia Alexaki
Alexaki Sofia <alexaki@ics.forth.gr> writes:
> But as I see from the query plan the indexes are not used, instead
> sequential search is done either I define indexes or not.
> As you can see below the query plan remains the same.
> Is that reasonable??? Shouldn't Postgresql use the indexes in order
> to optimize question???
Not necessarily. Since you're just doing a join without restricting
the query to a subset of either table, the indexes would only be
useful as a means of ordering the inputs to a mergejoin --- and an
indexscan over a whole table is *not* particularly fast, because of
all the random seeks involved.
The plausible plans for this sort of query are basically
Merge Join-> Index Scan on t1-> Index Scan on t2
Merge Join-> Sort -> Seq Scan on t1-> Sort -> Seq Scan on t2
Hash Join-> Seq Scan on t1-> Seq Scan on t2
(Postgres also considers mergejoins with indexscan on one side and
explicit sort on the other, but for brevity I ignore that possibility.)
Any of these might be the best choice depending on number of rows,
width of each row, and harder-to-predict factors like how well-ordered
the tuples are already. The planner's cost models are evidently
predicting that the hash join will be the quickest. You could
experiment, if you're interested, by forcing the choice by setting
ENABLE_HASHJOIN and ENABLE_SORT on or off, and then comparing the
estimated costs shown by EXPLAIN and the actual measured query
runtimes. If the estimated-cost ratios are wildly at variance with
the real runtimes then you have a legitimate gripe. But your gripe
should be that the cost models don't reflect reality, not that Postgres
ignores your indexes.
regards, tom lane