Обсуждение: Pros / cons for indexing a small table

Поиск
Список
Период
Сортировка

Pros / cons for indexing a small table

От
Rikard Bosnjakovic
Дата:
I'm having a simple table for managing hit counts on WWW. The table
was setup some year ago using this:

CREATE TABLE hit_counts (
  page varchar(255),
  hits integer,
  last_visit timestamp,
  id serial primary key
);

When updating a hit, I use this PHP-code:

  $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'";
  $res = pg_query($db, $q);
  if (pg_num_rows($res) == 0) {
    $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')";
  } else {
    $row = pg_fetch_object($res);
    $hits = $row->hits;
    $hits++;
    $q = "UPDATE hit_counters SET last_visit = (SELECT
localtimestamp(0)), hits = $hits WHERE page='$this_uri'";
  }
  pg_query($db, $q);

In other words, if the actual page is not in the table it is inserted.
Otherwise the "hits" is increased by one. The id-column is never used,
I simply appended it because of "good practise" using a primary key
for all my tables.

Now, my questions are:

1. Since the id-column is never used, I will probably drop it to save
some space. Or, is there any point in keeping it?

2. From what I've read in the manual and what other people have told
me, creating an index can speed up table accessing. I tried by adding
an index to the page-column in the above table. Then by using ANALYZE
and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a
pretty big save although I'm not sure what the value really is. Is
creating an index on 'page' the best way to speed up this particular
table?

3. The page-column is queried using full length strings (no regexps or
wildcards). Will there be any performance issues if I have the
page-column indexed and then run wildcard-queries on it? What I mean
is, if I run wildcard-queries on this indexed column, will the index
be of any use at all?


--
- Rikard

Re: Pros / cons for indexing a small table

От
Josh Kupershmidt
Дата:
On Tue, Nov 10, 2009 at 6:25 AM, Rikard Bosnjakovic
<rikard.bosnjakovic@gmail.com> wrote:
> I'm having a simple table for managing hit counts on WWW. The table
> was setup some year ago using this:
>
> CREATE TABLE hit_counts (
>  page varchar(255),
>  hits integer,
>  last_visit timestamp,
>  id serial primary key
> );
>
> When updating a hit, I use this PHP-code:
>
>  $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'";
>  $res = pg_query($db, $q);
>  if (pg_num_rows($res) == 0) {
>    $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')";
>  } else {
>    $row = pg_fetch_object($res);
>    $hits = $row->hits;
>    $hits++;
>    $q = "UPDATE hit_counters SET last_visit = (SELECT
> localtimestamp(0)), hits = $hits WHERE page='$this_uri'";
>  }
>  pg_query($db, $q);
>
[snip]
>
> 1. Since the id-column is never used, I will probably drop it to save
> some space. Or, is there any point in keeping it?

Drop it, and make column "page" your PRIMARY KEY. The way you have
that table declared now, you're implicitly using "page" as your
primary key anyway (i.e. lookups are done against "page", and your
code assumes page is unique), but you're not actually enforcing a
uniqueness constraint. In case other tables have FK references to
"hit_counts"."id", google for "keyvil" to understand the pitfalls of
using this surrogate key.

>
> 2. From what I've read in the manual and what other people have told
> me, creating an index can speed up table accessing. I tried by adding
> an index to the page-column in the above table. Then by using ANALYZE
> and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a
> pretty big save although I'm not sure what the value really is. Is
> creating an index on 'page' the best way to speed up this particular
> table?
>

A unique index or primary key on page will help enforce database
integrity and probably speed up your queries as well.


> 3. The page-column is queried using full length strings (no regexps or
> wildcards). Will there be any performance issues if I have the
> page-column indexed and then run wildcard-queries on it? What I mean
> is, if I run wildcard-queries on this indexed column, will the index
> be of any use at all?
>

See here, this should answer your question:
http://www.postgresql.org/docs/8.3/interactive/indexes-types.html


Some additional comments on your schema and application code:

First, you have a race condition in your code, assuming you are using
the default READ COMMITTED isolation level. What if another client
inserts a row in between when you check for the row's existence, and
insert if the row doesn't exist? SELECT FOR UPDATE is probably what
you want to use.

Second, I really hope that $this_uri is being properly escaped before
your queries are executed...

Third, I recommend using "timestamp with time zone" instead of your
timestamp type for last_visit -- and then just use CURRENT_TIMESTAMP
instead of calling localtimestamp(0). Your table right now looks like
it'll have problems during daylight savings switches.

  Josh

Re: Pros / cons for indexing a small table

От
Kris Kewley
Дата:
On point 2.

When adding an index u need to consider how u will be querying the
table.

In the example provided indicating that column Id was added for best
practice is in my opinion incorrect.

Index is added to avoid full sequential table scan. Could be wrong but
Id as you have defined its use is going to have 0 positive impact
unless u plan to reference it in where clause.

Thanks
Kris

On 10-Nov-09, at 6:25, Rikard Bosnjakovic
<rikard.bosnjakovic@gmail.com> wrote:

> I'm having a simple table for managing hit counts on WWW. The table
> was setup some year ago using this:
>
> CREATE TABLE hit_counts (
>  page varchar(255),
>  hits integer,
>  last_visit timestamp,
>  id serial primary key
> );
>
> When updating a hit, I use this PHP-code:
>
>  $q = "SELECT hits FROM hit_counters WHERE page='$this_uri'";
>  $res = pg_query($db, $q);
>  if (pg_num_rows($res) == 0) {
>    $q = "INSERT INTO hit_counters (page) VALUES ('$this_uri')";
>  } else {
>    $row = pg_fetch_object($res);
>    $hits = $row->hits;
>    $hits++;
>    $q = "UPDATE hit_counters SET last_visit = (SELECT
> localtimestamp(0)), hits = $hits WHERE page='$this_uri'";
>  }
>  pg_query($db, $q);
>
> In other words, if the actual page is not in the table it is inserted.
> Otherwise the "hits" is increased by one. The id-column is never used,
> I simply appended it because of "good practise" using a primary key
> for all my tables.
>
> Now, my questions are:
>
> 1. Since the id-column is never used, I will probably drop it to save
> some space. Or, is there any point in keeping it?
>
> 2. From what I've read in the manual and what other people have told
> me, creating an index can speed up table accessing. I tried by adding
> an index to the page-column in the above table. Then by using ANALYZE
> and EXPLAIN, the cost got reduced from 259 to 6 which I guess is a
> pretty big save although I'm not sure what the value really is. Is
> creating an index on 'page' the best way to speed up this particular
> table?
>
> 3. The page-column is queried using full length strings (no regexps or
> wildcards). Will there be any performance issues if I have the
> page-column indexed and then run wildcard-queries on it? What I mean
> is, if I run wildcard-queries on this indexed column, will the index
> be of any use at all?
>
>
> --
> - Rikard
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice