Обсуждение: Large Table Performance

Поиск
Список
Период
Сортировка

Large Table Performance

От
Edoceo Lists
Дата:
List,
   I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day.
There 
are only 20 columns in the table, mostly char and integer.  It's FK'd in two places to another table for import/export
transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search
is done inside a date range).  I thought it would be OK but after a few weeks of operation I have more than five
million 
records in there.  Some queries take more than five minutes to complete and I'm sad about that.  How can I make this
faster?  I could munge dates into integers if their faster, I'm OK with that.  What can I tweak in the configuration
file to speed things up?  What about some drastic schema change that more experience would have shown me?  I cannot
show 
the full schema but it's like this:

-- My import/export data information table
ie_data (id serial primary key, date date, [12 other columns here])

big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary
transaction detail columns])

So when I say
select x,y,z from big_transaction_table where date>='10/2/2005' and date<='10/4/2005' and transaction_status in (1,2,3)

order by date;
it takes five+ minutes.

TIA for any suggestions.

/djb

Re: Large Table Performance

От
Alex Stapleton
Дата:
On 22 Oct 2005, at 01:25, Edoceo Lists wrote:

> List,
>   I've got a problem where I need to make a table that is going to
> grow by an average of 230,000 records per day. There are only 20
> columns in the table, mostly char and integer.  It's FK'd in two
> places to another table for import/export transaction id's and I
> have a serial primary key and an index on a date column for when I
> need to search (every search is done inside a date range).  I
> thought it would be OK but after a few weeks of operation I have
> more than five million records in there.  Some queries take more
> than five minutes to complete and I'm sad about that.  How can I
> make this faster?  I could munge dates into integers if their
> faster, I'm OK with that.  What can I tweak in the configuration
> file to speed things up?  What about some drastic schema change
> that more experience would have shown me?  I cannot show the full
> schema but it's like this:
>
> -- My import/export data information table
> ie_data (id serial primary key, date date, [12 other columns here])
>
> big_transaction_table(id serial primary key, import_id int w/FK,
> export_id int w/FK, date date, [20 other necessary transaction
> detail columns])
>
> So when I say
> select x,y,z from big_transaction_table where date>='10/2/2005' and
> date<='10/4/2005' and transaction_status in (1,2,3) order by date;
> it takes five+ minutes.
>
> TIA for any suggestions.

What hardware are you on? What query plans (output from explain) do
your queries give you? What PG version?

We do about 100,000 rows a minute (300 MB+) a day so I suspect your
queries are doing full table scans or something. Of course we don't
use any FKs so I suppose they could be biting you.

Re: Large Table Performance

От
Michael Fuhr
Дата:
On Fri, Oct 21, 2005 at 05:25:22PM -0700, Edoceo Lists wrote:

[summary of situation]

> Some queries take more than five minutes to complete and I'm sad
> about that.  How can I make this faster?

You might get more help on pgsql-performance, which is specifically
for discussions of performance issues.

> I could munge dates into integers if their faster, I'm OK with that.

Let's identify the problem before thinking about possible solutions.

> What can I tweak in the configuration file to speed things up?

What version of PostgreSQL are you using?  What operating system?
What kind of hardware (CPU, disks, amount of memory, etc.)?  Are
you regularly vacuuming and analyzing all tables in the database?
Have you tuned your configuration at all?  What are the following
settings?

shared_buffers
work_mem (8.x) or sort_mem (7.x)
effective_cache_size
random_page_cost

You might want to read through a tuning guide like this one:

http://www.powerpostgresql.com/PerfList

> So when I say
> select x,y,z from big_transaction_table where date>='10/2/2005' and
> date<='10/4/2005' and transaction_status in (1,2,3) order by date;
> it takes five+ minutes.

Please post the EXPLAIN ANALYZE output (not just EXPLAIN) of this
query; that'll show us what the query planner is doing and how
accurate its row count estimates are.  If the row counts are way
off then you might see an improvement by increasing the statistics
target for the date and maybe the transaction_status columns.  Or
you might just need to analyze the table to update its statistics,
and possibly vacuum it to get rid of dead tuples.

--
Michael Fuhr