Re: Questions about indexes?

Поиск

Список

Период

Сортировка

От	Ryan Bradetich
Тема	Re: Questions about indexes?
Дата	17 февраля 2003 г. 02:05:34
Msg-id	1045465530.30629.26.camel@beavis.ybsoft.com обсуждение
Ответ на	Re: Questions about indexes? (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Questions about indexes?
Список	pgsql-hackers

Дерево обсуждения

On Sun, 2003-02-16 at 23:34, Tom Lane wrote:
> Ryan Bradetich <rbradetich@uswest.net> writes:
> > Although the table schema is immaterial, I will provide it so we have a
> > common framework for this discussion:
> 
> >     host_id        integer     (not null)
> >     timestamp    datetime    (not null)
> >     category    text        (not null)    [<=    5 chars]
> >     anomaly        text        (not null)    [<= 1024 chars]
> 
> > This table is used to store archived data, so each row in the table must
> > be unique.  Currently I am using a primary key across each column to
> > enforce this uniqueness.
> 
> It's not real clear to me why you bother enforcing a constraint that the
> complete row be unique.  Wouldn't a useful constraint be that the first
> three columns be unique?  Even if that's not correct, what's wrong with
> tolerating a few duplicates?  You can't tell me it's to save on storage
> ;-)

The table holds system policy compliance data.  The catagory is
basically the policy, and the anomaly is the detailed text explaining
why the system is out of compliance.  So the anomaly data is important
(and often the reason why the key is unique).  The reason we are
archiving the data is to generate reports and graphs showing policy
compliance over time.  Duplicated rows will artifically inflate the
numbers in the reports and graphs.  The other option we had was to
perform a DISTINCT select at report / graph time, we chose no to go this
route bacause of the sort added to the query.  (Also it just seemed
tidier to only store good data :))

The disk storage is a minor concern :), but I was actually looking at it
as a possible performance enhancement.  I am curious how it affects the
shared buffer cache, and also there should be less average pages to read
since the index size was smaller.

Does this make sense? Or am I out in left field again? :)

> > I am not sure why all the data is duplicated in the index ... but i bet
> > it has to do with performance since it would save a lookup in the main
> > table.
> 
> An index that can't prevent looking into the main table wouldn't be
> worth anything AFAICS ...

Ok, scratch that idea then :)  I will continue looking at other ideas
like the MD5 data hashing etc.  

Thanks for your input Tom!

- Ryan
        regards, tom lane
-- 
Ryan Bradetich <rbradetich@uswest.net>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Questions about indexes?