Обсуждение: A question about indexes...
Hello, I have the following tables in my db Painter (id integer, uri varchar(256)) and paints (id1 integer, id2 integer) I want to optimize the question select id from Painter where uri = 'xxxxx';What kind of index (Btree or Hash) is more efficientto create on field uri since it's a string? I also want to optimize the join between the ables Painter and paints on the fields id and id1 respectively? I can either define the field id as a Primary Key or create an Btree index on it. What is more effient?? From my test I see that creating Btree index is a bit faster!!. Would the performance (of the join) be improved if I created indexes both on field id and id1 or it's sufficient to create one of the two indexes ? As far as I can see the performance is improved if I have a Primary Key on Painter.id and a BTree index on paints.id1. However when I create a Btree index on Painter.id and a BTree index on paints.id1 performance gets worst. thank you in advance for your help Sofia Alexaki alexaki@ics.forth.gr
Alexaki Sofia <alexaki@ics.forth.gr> writes: > I can either define the field id as a Primary Key or create an Btree index > on it. What is more effient?? > From my test I see that creating Btree index is a bit faster!!. I think you're seeing things. Declaring a field primary key creates a btree index on it (and also enables UNIQUE and NOT NULL checks, but those don't affect the speed of lookups). There isn't going to be any difference between the two ways of doing it --- whatever difference you measured was due to other factors, eg, disk pages already in cache. As for your other point I'd generally recommend btree over hash indexes. The btree code is much more thoroughly tested, supports concurrent updates which hash indexes don't, and allows order-based index scans which hash doesn't. I don't see any redeeming social value in a hash index, actually... regards, tom lane
Hello, I have the following tables in my database Painter(id integer, uri varchar(256)) paints(id1 integer, id2 integer) in order to speed up the join (select * from painter, paints where painter.id= paints.id1) between these two tables I have created indexes on the field painter.id and/or paints.id1. But as I see from the query plan the indexes are not used, instead sequential search is done either I define indexes or not. As you can see below the query plan remains the same. Is that reasonable??? Shouldn't Postgresql use the indexes in order to optimize question???I can't see why is better to make sequential search since the size of tables is relatively big. A) No indexes are defined on the tables Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter1 (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) B1) BTree index on painter.id Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) B2) Primary Key on painter.id Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) C1) BTree index on painter.id and Btree on paints.id1 Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) C2) Primary Key on painter.id and Btree on paints.id1 Hash Join (cost=12269.78 rows=60014 width=24) -> Seq Scan on painter (cost=4234.97 rows=99999 width=16) -> Hash (cost=1931.92rows=50331 width=8) -> Seq Scan on paints (cost=1931.92 rows=50331 width=8) Regards, Sofia Alexaki
Alexaki Sofia <alexaki@ics.forth.gr> writes: > But as I see from the query plan the indexes are not used, instead > sequential search is done either I define indexes or not. > As you can see below the query plan remains the same. > Is that reasonable??? Shouldn't Postgresql use the indexes in order > to optimize question??? Not necessarily. Since you're just doing a join without restricting the query to a subset of either table, the indexes would only be useful as a means of ordering the inputs to a mergejoin --- and an indexscan over a whole table is *not* particularly fast, because of all the random seeks involved. The plausible plans for this sort of query are basically Merge Join-> Index Scan on t1-> Index Scan on t2 Merge Join-> Sort -> Seq Scan on t1-> Sort -> Seq Scan on t2 Hash Join-> Seq Scan on t1-> Seq Scan on t2 (Postgres also considers mergejoins with indexscan on one side and explicit sort on the other, but for brevity I ignore that possibility.) Any of these might be the best choice depending on number of rows, width of each row, and harder-to-predict factors like how well-ordered the tuples are already. The planner's cost models are evidently predicting that the hash join will be the quickest. You could experiment, if you're interested, by forcing the choice by setting ENABLE_HASHJOIN and ENABLE_SORT on or off, and then comparing the estimated costs shown by EXPLAIN and the actual measured query runtimes. If the estimated-cost ratios are wildly at variance with the real runtimes then you have a legitimate gripe. But your gripe should be that the cost models don't reflect reality, not that Postgres ignores your indexes. regards, tom lane