Обсуждение: help required in design of database

Поиск
Список
Период
Сортировка

help required in design of database

От
david drummard
Дата:
Hi,

I have an unique requirement. I have a feed of 2.5 - 3 million rows of data which arrives every 1/2 an hour. Each row has 2 small string values  (about 50 chars each) and 10 int values. I need searcheability and running arbitrary queries on any of these values. This means i have to create an index on every column. The feed comes in as a text file comma separated. Here is what i am planning to do

1) create a new table every time a new feed file comes in. Create table with indexes. Use the copy command to dump the data into the table.
2) rename the current table to some old table name and rename the new table to current table name so that applications can access them directly.

Note that these are read only tables and it is fine if the step 2 takes a small amount of time (it is not a mission critical table hence, a small downtime of some secs is fine).

My question is what is the best way to do step (1) so that after the copy is done, the table is fully indexed  and properly balanced and optimized for query.
Should i create indexes before or after import ? I need to do this in shortest period of time so that the data is always uptodate. Note that incremental updates are not possible since almost every row will be changed in the new file.

my table creation script looks like this

create table datatablenew(fe varchar(40), va varchar(60), a int, b int, c int, d int, e int, f int, g int, h int, i int, j int, k int, l int, m int, n int, o int, p int, q real);
create index fe_idx on datatablenew using hash (fe);
create index va_idx on datatablenew using hash(va);
create index a_idx on datatablenew (a);
......
create index q_idx on datatablenew(q);


please advice.

thanks
vijay

Re: help required in design of database

От
"Steinar H. Gunderson"
Дата:
On Fri, Feb 10, 2006 at 12:20:34PM -0800, david drummard wrote:
> 1) create a new table every time a new feed file comes in. Create table with
> indexes. Use the copy command to dump the data into the table.
> 2) rename the current table to some old table name and rename the new table
> to current table name so that applications can access them directly.

That sounds like a working plan.

> Should i create indexes before or after import ? I need to do this in
> shortest period of time so that the data is always uptodate. Note that
> incremental updates are not possible since almost every row will be changed
> in the new file.

You should create indexes after the import. Remember to pump up your memory
settings (maintenance_work_mem) if you want this to be quick.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: help required in design of database

От
Markus Schaber
Дата:
Hi, David,

david drummard wrote:

> 1) create a new table every time a new feed file comes in. Create table
> with indexes. Use the copy command to dump the data into the table.

Its faster to obey the following order:

- Create the table
- COPY the data into the table
- Create the indices
- ANALYZE the table.

and probably CLUSTER the table on the most-used index, between index
creation and ANALYZE. You also might want to increase the statistics
target on some columns before ANALYZE, depending on your data.

> 2) rename the current table to some old table name and rename the new
> table to current table name so that applications can access them directly.

You can also use a view, and then use CREATE OR REPLACE VIEW to switch
between the tables.

But two table renames inside a transaction should do as well, and
shorten the outage time, as with the transaction encapsulation, no
external app should see the change inside their transaction.

HTH,
Markus


--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org