Re: database design question, new to postgresql

Поиск
Список
Период
Сортировка
От Fei Liu
Тема Re: database design question, new to postgresql
Дата
Msg-id 46326234.7000709@aepnetworks.com
обсуждение исходный текст
Ответ на Re: database design question, new to postgresql  (Fei Liu <fei.liu@aepnetworks.com>)
Список pgsql-novice
Fei Liu wrote:
> Fei Liu wrote:
>> Hello group, I need to design and develop a web reporting system to
>> let users query/view syslog files on a unix host. For now, I am
>> concentrating on the authentication file that has user logon
>> (success/failure) and logoff records. The log file is logrotated
>> every week or so. My reporting system parses the log entries and put
>> the result into a postgresql database (I am proposing to use
>> postgresql as the backend). Since this deals with multi-year archive
>> and I believe 'partitioing' is an ideal feature to handle this
>> problem. So here is the design scheme:
>>
>> CREATE TABLE logon_success(
>>   name varchar(32) not null,
>>   srcip inet not null,
>>   date date not null,
>>   time time not null,
>>   ...
>> );
>>
>>
>> CREATE TABLE logon_success_yy${year}mm${month}(
>>   CHECK (date >= DATE '$year-$month-01' AND date < DATE
>> '$next_year-$next_month-1')
>> )
>> INHERITS ($tname)
>> ;
>>
>> As you can see from the sample code, I am using perl to dynamically
>> generate children tables as I parse log files in a daily cron job
>> script. Once the log file is analyzed and archived in the database, I
>> have a simple web UI that sysadmin can select and view user logon
>> events. I have built a sample framework and it works so far. Keep in
>> mind, this reporting system is not limited to just user logon, it
>> should also work with system events such as services
>> failures/startup, hardware failures, etc
>>
>> Now here are my questions:
>> 1) Should I use database to implement such a reporting system? Are
>> there any alternatives, architects, designs?
>> 2) Is partitioning a good approach to speed up log query/view? The
>> user comment in partitioning in pgsql manual seems to indicate
>> partitioning may be slower than non-partitioned table under certain
>> circumstances.
>> 3) How to avoid repetitive log entry scanning since my cron job
>> script is run daily but logrotate runs weekly? This means everytime
>> my script will be parsing duplicate entries.
>> 4) When parsing log files, it's quite possible that there are
>> identical entries, for example a user logins really fast, resulting 2
>> or more identical entries..In this case can I still use primary
>> key/index at all? If I can, how do I design primary key or index to
>> speed up query?
>> 3) What are the most glaring limitations and flaws in my design?
>>
>> Thank you for taking time to review and answer my questions! Let me
>> know if I am not clear on any specific detail..
>>
>> Fei
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
> Let me add one more question, what are the best approaches to analyze
> postgresql query performance and how to improve postgresql query
> performance?
> Fei
My initial testing has not shown any significant difference between a
partitioning approach and a plain (all entries in master) database
approach...
 2005-01-01 | 00:27:55 | firewood | ssh | Login Successful | None |
local | user9819 | 192.168.1.31

My test was based on two artificial tables that has 1700 records per day
from 2004-02-01 to 2007-04-27, around 2 million entries that are
identical in both tables.
My test script:
echo Testing database $t1 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t1 where date > '2005-03-01' and date <
'2006-12-11';
\q
EOF

echo Testing database $t2 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t2 where date > '2005-03-01' and date <
'2006-12-11';
\q
EOF

Result:
./timing_test.sh
Testing database logon_test time based
  count
---------
 1121472
(1 row)

0.00user 0.00system 0:02.92elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps
Testing database logon_test2 time based
  count
---------
 1121472
(1 row)

0.00user 0.00system 0:02.52elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps

But the numbers are really not static and logon_test2 (with
partitioning) sometimes behave worse than logon_test...
Fei

В списке pgsql-novice по дате отправления:

Предыдущее
От: Fei Liu
Дата:
Сообщение: Re: database design question, new to postgresql
Следующее
От: Mark Deering
Дата:
Сообщение: need good editor for postgreSQL