Handling large volumes of data
От | Johann Spies |
---|---|
Тема | Handling large volumes of data |
Дата | |
Msg-id | 20080408094233.GG13409@sun.ac.za обсуждение исходный текст |
Ответы |
Re: Handling large volumes of data
("Shoaib Mir" <shoaibmir@gmail.com>)
Re: Handling large volumes of data (Tino Schwarze <postgresql@tisc.de>) Re: Handling large volumes of data (Michael Monnerie <michael.monnerie@it-management.at>) |
Список | pgsql-admin |
I want to set up a system where logs of all kinds can be put in tables so that queries and reports can be generated from there. Our Firewall logs alone generate about 600,000,000 lines per month and that will increase as we get more bandwidth. I am testing postgresql's ability to handle large datasets. I have loaded about 4,900,000,000 in one of two tables with 7200684 in the second table in database 'firewall', built one index using one date-field (which took a few days) and used that index to copy about 3,800,000,000 of those records from the first to a third table, deleted those copied record from the first table and dropped the third table. This took about a week on a 2xCPU quadcore server with 8Gb RAM. During and after that exercise the server was lethargic and constantly ran at a load average of at least 5. So I decided maybe it is a good thing to run 'autovacuum analyse' to clean up things. After a few days the process is still running and the load average still constantly about 5. I then decided to just drop the tables. I did that about 18 hours ago and still there is no sign of Postgresql finishing that (pid 18614 below). The machine is sluggish. A 'ps fanx' shows: 12501 ? S 0:00 /usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main -c config_file=/etc/postgresql/8.1/main/postgresql.conf 12504 ? D 0:54 \_ postgres: writer process 12505 ? S 1:42 \_ postgres: stats buffer process 12506 ? S 1:03 | \_ postgres: stats collector process 15918 ? D 1214:28 \_ postgres: autovacuum process firewall 18613 ? D 130:08 \_ postgres: exilog exilog 146.232.128.197(53907) SELECT 18614 ? S 12:42 \_ postgres: exilog exilog 146.232.128.197(53908) idle 31932 ? D 16:11 \_ postgres: exilog exilog 146.232.128.106(36547) INSERT 10380 ? S 0:00 \_ postgres: log firewall [local] DROP TABLE waiting 20753 ? D 5:11 \_ postgres: exilog exilog 146.232.128.197(43581) INSERT 16370 ? S 5:04 \_ postgres: exilog exilog 146.232.128.59(40620) idle 3483 ? S 0:50 \_ postgres: exilog exilog 146.232.128.49(33803) INSERT 3484 ? S 0:04 \_ postgres: exilog exilog 146.232.128.49(33804) idle 3485 ? S 0:02 \_ postgres: exilog exilog 146.232.128.49(33805) idle dstat-output shows a lot of IO. It is actually better now. Earlier today it was constantly about 50M per line: ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq|_read _writ|_recv _send|__in_ _out_|_int_ _csw_ 4 1 87 9 0 0| 24M 15M| 0 0 | 96B 10k|1108 1902 3 1 85 11 0 0| 38M 32M|7242B 8796B| 0 0 |1175 1501 2 0 88 10 0 0| 27M 33M|9949B 10k| 0 0 | 737 714 1 1 87 10 0 0| 24M 28M|2412B 2948B| 0 0 | 637 533 2 0 87 11 0 0| 28M 38M|2507B 2306B| 0 0 | 789 765 2 1 87 9 0 0| 40M 38M|2268B 2450B| 0 0 | 900 795 2 1 85 12 0 0| 33M 25M|4753B 3376B| 36k 0 | 950 1217 2 2 80 16 0 0| 24M 28M|2590B 2738B| 0 0 | 899 1487 2 1 84 12 0 0| 32M 40M|2603B 3025B| 0 0 |1042 1377 2 0 86 11 0 0| 28M 30M|8530B 9116B| 0 0 |1054 1302 1 0 77 22 0 0|8412k 12M| 12k 12k| 0 0 | 854 1286 Apparently the best approach is not to have very large tables. I am thinking of making (as far as the firewall is concerned) a different table for each day and then drop the older tables as necessary. Any advice on how to best handle this kind of setup will be appreciated. Regards Johann -- Johann Spies Telefoon: 021-808 4036 Informasietegnologie, Universiteit van Stellenbosch "Be careful for nothing; but in every thing by prayer and supplication with thanksgiving let your requests be made known unto God. And the peace of God, which passeth all understanding, shall keep your hearts hearts and minds through Christ Jesus." Philippians 4:6,7
В списке pgsql-admin по дате отправления: