Re: Performance with Large Volumes of Data

Поиск
Список
Период
Сортировка
От John Coers
Тема Re: Performance with Large Volumes of Data
Дата
Msg-id 3AEDB916.BE3123A0@intrinsity.com
обсуждение исходный текст
Список pgsql-general
I recently posted this on the admin list and got no response. Could
anyone here help me?


 Hi,

 I am non a "real" sysadmin or dba, but "got stuck" doing it and am trying to learn via a fairly
 difficult problem that my group must deal with: a LARGE volume of data.  I have been working
 from scratch on this for about 3 weeks and have runs lots of tests.

 I am running postgres7.1 on a Solaris5.7 with 1GB RAM and 2 300MHZ processors and a 6GB partition.

 The application I am using it for is to COPY a LARGE amount of data (avg of 15k rows
 of 3 ints every 15 minutes or so avg from 170 machines day/250 at night) into a db
 and then do a query after the fact.  The COPIES are done via the libq PQputline()
 subroutine.  The after-the-fact query will postprocess the data and reduce the
 amount and granularity of data then load it into a new table.  Ultimately, I will have
 1 db with a table of about 250M rows and several other dbs each with 10's of millions...
 Multiple GBs of data.

 Here are the options I run with postmaster:

 postmaster -D /evsx/aus16/coers -o "-S 32768" -i -B 8192 -N 2

 Here are my IPC params:
 set shmsys:shminfo_shmmax=524288000
 set shmsys:shminfo_shmmin=16
 set shmsys:shminfo_shmmni=200
 set shmsys:shminfo_shmseg=200
 set semsys:seminfo_semmni=500
 set semsys:seminfo_semmns=500
 set semsys:seminfo_semmsl=500
 set semsys:seminfo_semume=100

 Here are my questions:

 1)  NUM OF CONNECTIONS: I use -N 2 because that seems to optimize performance.  If I allow more connections,
 the server bogs down, ultimately to a near-stand still if I allow too many connections.  I assume
 this is because all of the competing connections are all trying to COPY to the same database
 and they block all but one and all the semaphore chasing slows everything down.  The weird
 thing is that the CPU, iowait and swap waits on top do not elevate very much.  What is slowing
 things down in this case?  Currently, I have each client try for a connection and if it fails,
 wait for rand()%4+1 and then try again.  This actually works pretty well, but it seems to me
 that the server should be handling this and be doing a more efficient job.  Also, am I correct
 in assuming there is no way to keep multiple COPIES to the same db & table from blocking?  When
 I tried to setnonblocking, data got dropped.

 2)  BOTTLENECK:  I was running 2 queries on the 65M rows of data I had collected after I had finished loading.
 I had not indexed the tables.  Based on the top reading below, what is the bottleneck that is
 slowing the query down?  The same phenomenon occurs when COPYing data into the table.

 last pid: 15973;  load averages:  0.98,  0.92,  0.78
 41 processes:  38 sleeping, 1 running, 2 on cpu
 CPU states: 48.5% idle, 43.8% user,  5.2% kernel,  2.6% iowait,  0.0% swap
 Memory: 1024M real, 17M free, 109M swap in use, 2781M swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 15919 postgres   1  40    0   71M   69M cpu0   30:58 22.12% postgres
 15966 postgres   1  30    0   71M   69M run    12:02 23.63% postgres

 3)  CONGIGURATION/SETTINGS:  Are my IPC params and postmaster options set right for my application?
 My thinking is that I need lots of shared memory to reduce disk access.  Am I missing something?  Are
 there any other configurable kernal params that I need to know about?  What exactly will the sort
 mem (-o "-S") buy me?

 Thanks for your patience!



--
John Coers            Intrinsity, Inc.
coers@intrinsity.com  Austin, Texas

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Locking a database
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL and mySQL database size question