Обсуждение: performance question related to pgsql

Поиск
Список
Период
Сортировка

performance question related to pgsql

От
david drummard
Дата:
I have an unique requirement. I have a feed of 2.5 - 3 million rows of data which arrives every 1/2 an hour. Each row has 2 small string values  (about 50 chars each) and 10 int values. I need searcheability and running arbitrary queries on any of these values. This means i have to create an index on every column. The feed comes in as a text file comma separated. Here is what i am planning to do

1) create a new table every time a new feed file comes in. Create table with indexes. Use the copy command to dump the data into the table.
2) rename the current table to some old table name and rename the new table to current table name so that applications can access them directly.

Note that these are read only tables and it is fine if the step 2 takes a small amount of time (it is not a mission critical table hence, a small downtime of some secs is fine).

My question is what is the best way to do step (1) so that after the copy is done, the table is fully indexed  and properly balanced and optimized for query.
Should i create indexes before or after import ? I need to do this in shortest period of time so that the data is always uptodate. Note that incremental updates are not possible since almost every row will be changed in the new file.

my table creation script looks like this

create table datatablenew(fe varchar(40), va varchar(60), a int, b int, c int, d int, e int, f int, g int, h int, i int, j int, k int, l int, m int, n int, o int, p int, q real);
create index fe_idx on datatablenew using hash (fe);
create index va_idx on datatablenew using hash(va);
create index a_idx on datatablenew (a);
......
create index q_idx on datatablenew(q);


please advice.

thanks
vijay

Re: performance question related to pgsql

От
Stephen Frost
Дата:
* david drummard (vijayspam@gmail.com) wrote:
> My question is what is the best way to do step (1) so that after the copy is
> done, the table is fully indexed  and properly balanced and optimized for
> query.
> Should i create indexes before or after import ? I need to do this in
> shortest period of time so that the data is always uptodate. Note that
> incremental updates are not possible since almost every row will be changed
> in the new file.

Create indexes after import, definitely.  analyze after indexes have
been created.  You don't need to vacuum if you're creating a whole new
table each time.  Remember that if you need to adjust analyze parameters
for some reason that you'll need to do that on every table creation as
well.  Using COPY is a good approach.  Not doing incremental updates
when most of the rows are changing also makes sense.

    Thanks,

        Stephen

Вложения

Re: performance question related to pgsql

От
david drummard
Дата:
hi stephen,

thanks very much for the response. Are there any special commands to analyze the index before using the table. If i rename the table, will the indexes still stay with the table  ( i hope so).

best regards
vijay eranti

On 2/10/06, Stephen Frost <sfrost@snowman.net> wrote:
* david drummard (vijayspam@gmail.com) wrote:
> My question is what is the best way to do step (1) so that after the copy is
> done, the table is fully indexed  and properly balanced and optimized for
> query.
> Should i create indexes before or after import ? I need to do this in
> shortest period of time so that the data is always uptodate. Note that
> incremental updates are not possible since almost every row will be changed
> in the new file.

Create indexes after import, definitely.  analyze after indexes have
been created.  You don't need to vacuum if you're creating a whole new
table each time.  Remember that if you need to adjust analyze parameters
for some reason that you'll need to do that on every table creation as
well.  Using COPY is a good approach.  Not doing incremental updates
when most of the rows are changing also makes sense.

        Thanks,

                Stephen


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFD7Pu9rzgMPqB3kigRArthAJ4vgBzbUL/0YbeWu/luVFIGaP43rwCgkkf2
2Y7ADi6UEbhpl8CyDNC2jA8=
=WhhZ
-----END PGP SIGNATURE-----



Re: performance question related to pgsql

От
Stephen Frost
Дата:
* david drummard (vijayspam@gmail.com) wrote:
> thanks very much for the response. Are there any special commands to analyze
> the index before using the table. If i rename the table, will the indexes
> still stay with the table  ( i hope so).

You want to run 'analyze <tablename>' after you've created the index.
Renaming the table will retain the indexes (and should retain everything
else, analyze statistics, etc).  The only issue you might need to be
careful about is that renaming the table might not rename the indexes,
in which case you might get errors when you try to create the new
indexes about an index with that name already existing.

Just be sure to test your scripts out w/ empty tables to make sure
everything in the rotation works correctly so that when you add the data
it all works correctly.

By the way, there's a Postgres mailing list dedicated to Performance
issues.  These questions would probably be more appropriate there.

    Enjoy,

        Stephen

Вложения