Re: Optimising SELECT on a table with one million rows

Поиск

Список

Период

Сортировка

От	Bryan Murphy
Тема	Re: Optimising SELECT on a table with one million rows
Дата	30 июля 2007 г. 14:14:18
Msg-id	bd8531800707301014m6ca15207meb9efe684725b2d1@mail.gmail.com обсуждение исходный текст
Ответ на	Optimising SELECT on a table with one million rows (Cultural Sublimation <cultural_sublimation@yahoo.com>)
Список	pgsql-general

Дерево обсуждения

First question... did you create the appropriate indexes on the appropriate columns for these tables? Foreign keys do not implicitly create indexes in postgres.

Bryan

On 7/30/07, Cultural Sublimation <cultural_sublimation@yahoo.com> wrote:

Hi,

I'm fairly new with Postgresql, so I am not sure if the performance
problems I'm having are due to poorly constructed queries/indices,
or if I bumped into more fundamental problems requiring a design of
my database structure.  That's why I'm requesting your help.

Here's the situation: I have three tables: Users, Stories, and Comments.
Stories have an author (a user), and a comment is associated with a
story and with the user who posted it.  The structure of the database
is therefore fairly simple:  (there are also some sequences, which I've
omitted for clarity)

CREATE TABLE users
        (
        user_id                 int UNIQUE NOT NULL,
        user_name               text,
        PRIMARY KEY (user_id)
        );

CREATE TABLE stories
        (
        story_id                int UNIQUE NOT NULL,
        story_title             text,
        story_body              text,
        story_timestamp         timestamptz,
        story_author            int REFERENCES users (user_id) NOT NULL,
        PRIMARY KEY (story_id)
        );

CREATE TABLE comments
        (
        comment_id              int UNIQUE NOT NULL,
        comment_title           text,
        comment_body            text,
        comment_timestamp       timestamptz,
        comment_story           int REFERENCES stories (story_id) NOT NULL,
        comment_author          int REFERENCES users (user_id) NOT NULL,
        PRIMARY KEY (comment_id)
        );

I've also populated the database with some test data, comprising 1,000
users, 1,000 stories (one per user), and 1,000,000 comments (one comment
per user per story).

Now, the query I wish to optimise is also simple: get me all comments (the
comment_id suffices) and corresponding user *names* for a given story.
If for example the story_id is 100, the query looks like this:

SELECT comments.comment_id, users.user_name
FROM comments, users
WHERE comments.comment_story = 100 AND comments.comment_author = users.user_id ;

The problem is that this query takes a *very* long time.  With the said
1,000,000 comments, it needs at least 1100ms on my system.  "Explain
analyze" tells me that a sequential scan is being performed on both
users and comments:

Hash Join  (cost=28.50..21889.09 rows=988 width=14) (actual
time=3.674..1144.779 rows=1000 loops=1)
   Hash Cond: ((comments.comment_author)::integer = (users.user_id)::integer)
   ->  Seq Scan on comments  (cost= 0.00..21847.00 rows=988 width=8) (actual
time=0.185..1136.067 rows=1000 loops=1)
         Filter: ((comment_story)::integer = 100)
   ->  Hash  (cost=16.00..16.00 rows=1000 width=14) (actual time=3.425..3.425
rows=1000 loops=1)
         ->  Seq Scan on users  (cost=0.00..16.00 rows=1000 width=14) (actual
time=0.068..1.845 rows=1000 loops=1)
Total runtime: 1146.424 ms

On the long run, I guess one possible solution to this problem will be
to partition the comments table into a number of sub-tables, most likely
based on the timestamp attribute (by having current versus historic data).
Nevertheless, I am wondering if there are other more straightforward ways
to optimise this query.  Some clever use of indices, perhaps?  Or is
the way I am now constructing the select non-optimal?  Or do I need
some pixie-magic on the Postgresql settings?  Anyway, any suggestions
are welcome!  (and thanks in advance)

Regards,
C.S.

____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Optimising SELECT on a table with one million rows