Обсуждение: Pros / cons for indexing a small table
I'm having a simple table for managing hit counts on WWW. The table was setup some year ago using this: CREATE TABLE hit_counts ( page varchar(255), hits integer, last_visit timestamp, id serial primary key ); When updating a hit, I use this PHP-code: $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'"; $res = pg_query($db, $q); if (pg_num_rows($res) == 0) { $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')"; } else { $row = pg_fetch_object($res); $hits = $row->hits; $hits++; $q = "UPDATE hit_counters SET last_visit = (SELECT localtimestamp(0)), hits = $hits WHERE page='$this_uri'"; } pg_query($db, $q); In other words, if the actual page is not in the table it is inserted. Otherwise the "hits" is increased by one. The id-column is never used, I simply appended it because of "good practise" using a primary key for all my tables. Now, my questions are: 1. Since the id-column is never used, I will probably drop it to save some space. Or, is there any point in keeping it? 2. From what I've read in the manual and what other people have told me, creating an index can speed up table accessing. I tried by adding an index to the page-column in the above table. Then by using ANALYZE and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a pretty big save although I'm not sure what the value really is. Is creating an index on 'page' the best way to speed up this particular table? 3. The page-column is queried using full length strings (no regexps or wildcards). Will there be any performance issues if I have the page-column indexed and then run wildcard-queries on it? What I mean is, if I run wildcard-queries on this indexed column, will the index be of any use at all? -- - Rikard
On Tue, Nov 10, 2009 at 6:25 AM, Rikard Bosnjakovic <rikard.bosnjakovic@gmail.com> wrote: > I'm having a simple table for managing hit counts on WWW. The table > was setup some year ago using this: > > CREATE TABLE hit_counts ( > page varchar(255), > hits integer, > last_visit timestamp, > id serial primary key > ); > > When updating a hit, I use this PHP-code: > > $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'"; > $res = pg_query($db, $q); > if (pg_num_rows($res) == 0) { > $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')"; > } else { > $row = pg_fetch_object($res); > $hits = $row->hits; > $hits++; > $q = "UPDATE hit_counters SET last_visit = (SELECT > localtimestamp(0)), hits = $hits WHERE page='$this_uri'"; > } > pg_query($db, $q); > [snip] > > 1. Since the id-column is never used, I will probably drop it to save > some space. Or, is there any point in keeping it? Drop it, and make column "page" your PRIMARY KEY. The way you have that table declared now, you're implicitly using "page" as your primary key anyway (i.e. lookups are done against "page", and your code assumes page is unique), but you're not actually enforcing a uniqueness constraint. In case other tables have FK references to "hit_counts"."id", google for "keyvil" to understand the pitfalls of using this surrogate key. > > 2. From what I've read in the manual and what other people have told > me, creating an index can speed up table accessing. I tried by adding > an index to the page-column in the above table. Then by using ANALYZE > and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a > pretty big save although I'm not sure what the value really is. Is > creating an index on 'page' the best way to speed up this particular > table? > A unique index or primary key on page will help enforce database integrity and probably speed up your queries as well. > 3. The page-column is queried using full length strings (no regexps or > wildcards). Will there be any performance issues if I have the > page-column indexed and then run wildcard-queries on it? What I mean > is, if I run wildcard-queries on this indexed column, will the index > be of any use at all? > See here, this should answer your question: http://www.postgresql.org/docs/8.3/interactive/indexes-types.html Some additional comments on your schema and application code: First, you have a race condition in your code, assuming you are using the default READ COMMITTED isolation level. What if another client inserts a row in between when you check for the row's existence, and insert if the row doesn't exist? SELECT FOR UPDATE is probably what you want to use. Second, I really hope that $this_uri is being properly escaped before your queries are executed... Third, I recommend using "timestamp with time zone" instead of your timestamp type for last_visit -- and then just use CURRENT_TIMESTAMP instead of calling localtimestamp(0). Your table right now looks like it'll have problems during daylight savings switches. Josh
On point 2. When adding an index u need to consider how u will be querying the table. In the example provided indicating that column Id was added for best practice is in my opinion incorrect. Index is added to avoid full sequential table scan. Could be wrong but Id as you have defined its use is going to have 0 positive impact unless u plan to reference it in where clause. Thanks Kris On 10-Nov-09, at 6:25, Rikard Bosnjakovic <rikard.bosnjakovic@gmail.com> wrote: > I'm having a simple table for managing hit counts on WWW. The table > was setup some year ago using this: > > CREATE TABLE hit_counts ( > page varchar(255), > hits integer, > last_visit timestamp, > id serial primary key > ); > > When updating a hit, I use this PHP-code: > > $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'"; > $res = pg_query($db, $q); > if (pg_num_rows($res) == 0) { > $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')"; > } else { > $row = pg_fetch_object($res); > $hits = $row->hits; > $hits++; > $q = "UPDATE hit_counters SET last_visit = (SELECT > localtimestamp(0)), hits = $hits WHERE page='$this_uri'"; > } > pg_query($db, $q); > > In other words, if the actual page is not in the table it is inserted. > Otherwise the "hits" is increased by one. The id-column is never used, > I simply appended it because of "good practise" using a primary key > for all my tables. > > Now, my questions are: > > 1. Since the id-column is never used, I will probably drop it to save > some space. Or, is there any point in keeping it? > > 2. From what I've read in the manual and what other people have told > me, creating an index can speed up table accessing. I tried by adding > an index to the page-column in the above table. Then by using ANALYZE > and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a > pretty big save although I'm not sure what the value really is. Is > creating an index on 'page' the best way to speed up this particular > table? > > 3. The page-column is queried using full length strings (no regexps or > wildcards). Will there be any performance issues if I have the > page-column indexed and then run wildcard-queries on it? What I mean > is, if I run wildcard-queries on this indexed column, will the index > be of any use at all? > > > -- > - Rikard > > -- > Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-novice