general question on two-partition table

Поиск
Список
Период
Сортировка
От Janet Jacobsen
Тема general question on two-partition table
Дата
Msg-id 4A6E3DDA.2080606@lbl.gov
обсуждение исходный текст
Ответы Re: general question on two-partition table
Re: general question on two-partition table
Список pgsql-general
Hi.  We have a table with 30 M records that is growing by
about 100 K records per day.

The experimentalists, whose data are in the table, have
decided that they will focus on the records for which the
value of one field, rbscore, is greater than a cut-off.
However, they want to continue to store all of the data
- even the records for which rbscore is less than the cutoff
- in the database.

For the current table, there are about 400 K (out of 30 M)
records that meet the 'good' criteria.

Each record in the table has about 40 fields, and the
experimentalists do in fact write queries that use many of
those fields, some more than others.  (They are building a
model and have not pinned down exactly which fields are
more important than others, so that's why they store and
query by so many fields.)

If they are going to spend 95% of their time querying the
records that meet the 'good' criteria, what are the good
strategies for ensuring good performance for those queries?
(1) Should I partition the table into two partitions based on
the value of rbscore?
(2) Should I create two separate tables?

Are (1) and (2) more or less equivalent in terms of
performance?

I think that partitioning the table is a more flexible option
(i.e., what if the cutoff value changes, no need to change
the name of the table being queried, etc.), but would (2)
give better performance given that 95% of their queries
are for rbscores greater than a threshold value?

Can you suggest other strategies?

Thank you,
Janet




В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: For production: 8.4 or 8.3?
Следующее
От: Christophe Pettus
Дата:
Сообщение: Building from source vs RPMs