Обсуждение: DB design advice

Поиск

Список

Период

Сортировка

DB design advice

От

Toni Alfirević

Дата:

06 июня 2013 г., 09:29:42

Hi,

i'm in need of an advice and here is my situation.

I've started putting stuff on paper, drawing my application and everything seems simple enough. My application will store large amount of string type entries (let's call them web server logs for the sake of this mail)

and by "large amount" I'm thinking in terms of couple of gigabytes a day (could be more but at first it will be several hundredth megabytes a day).

Relevant data structure looks something like this:

Logs

-----------------

Id - bigserial

Value - text

Timestamp - timestamp

UserId - FK_Users_Id

Re: DB design advice

От

Sergey Konoplev

Дата:

07 июня 2013 г., 05:39:49

On Thu, Jun 6, 2013 at 2:29 AM, Toni Alfirević <toni.alfirevic@gmail.com> wrote:
> And app should allow user to search trough his logs filtered by one or more
> tags and time interval. Here is one more constraint, since those messages

Is it supposed to return all the data from this interval at once, or
to be implemented like a kind of pagination/slider, that gets a
limited amount of entries since the beginning of the interval, and
then, after user requested more, it gets entries that are after the
last one from the previous bunch, and so on?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: DB design advice

От

Toni Alfirević

Дата:

07 июня 2013 г., 06:43:17

On 7. 6. 2013., at 07:39, Sergey Konoplev <gray.ru@gmail.com> wrote:

> On Thu, Jun 6, 2013 at 2:29 AM, Toni Alfirević <toni.alfirevic@gmail.com> wrote:
>> And app should allow user to search trough his logs filtered by one or more
>> tags and time interval. Here is one more constraint, since those messages
>
> Is it supposed to return all the data from this interval at once, or
> to be implemented like a kind of pagination/slider, that gets a
> limited amount of entries since the beginning of the interval, and
> then, after user requested more, it gets entries that are after the
> last one from the previous bunch, and so on?

Hi Sergey,

It will most likely return subsets of data just like you said (pagination).

But there will be a situation where all data will need to be analyzed for, let's call it', a report. On the other hands
thosereports are not that sensitive when it comes to performance. It won't be an issue if it takes a minute, two or
threeto generate it.

Re: DB design advice

От

Sergey Konoplev

Дата:

08 июня 2013 г., 07:17:56

On Thu, Jun 6, 2013 at 11:43 PM, Toni Alfirević
<toni.alfirevic@gmail.com> wrote:
> It will most likely return subsets of data just like you said (pagination).

Okay. What do we have?

Table log (id bigserial primary key, ts timestamptz, value text,
user_id bigint) with (user_id) foreign key .

Table tag (id bigserial primary key, value text).

Table log_tag (log_id bigint, log_ts timestamptz, log_user_id bigint,
tag_id bigint) with foreign keys (log_id, log_ts, log_user_id) and
(tag_id).

Index log_tag_pk (log_user_id, tag_id, log_ts, log_id) on log_tag.

It it is not so critical, you can make it without foreign keys to gain
some more performance of course, but I wont recommend it on early
stages of your software until it is mature and covered by tests good
enough, so you could believe that it wont break your data integrity.
You can always do this later.

We also have a plpgsql stored function get_log_by_tags(i_log_user_id
bigint, i_tag_ids bigint[], i_ts timestamptz, i_limit) returning set
of log.

This function initiates an empty resulting array _result_logs log[].

Then it takes each _tag_id from i_tag_igs with in a for loop.

And selects all log entries for i_log_user_id and _tag_id inserted
after or before i_ts, depending on the order you want to output your
log records by, and limiting it by i_limit. Then it concatenates these
entries with the resulting array, orders the result by ts and limits
it with i_limit.

This query is supposed to be in the for loop and represent the
paragraph above. I will use descending order as an example.

select into _result_logs array(
    select ll from (
        (
            select log.* from log_tag join log on log.id = log_id
            where log_user_id = i_log_user_id
            and tag_id = _tag_id and log_ts > i_ts
            order by log_ts desc limit i_limit
        ) union (
            select (l::log).* from unnest(_result_logs) as l
        )
    ) as ll
    order by ts limit i_limit
)

I did not check this query so it might contains some typos, etc.

Finally it returns the result of unnesting _result_logs.

select (l::log).* from unnest(_result_logs) as l

If you need to be able to go back and forth you can send, for example,
add an i_back boolean parameter to the function, and according to its
value choose either log_ts > i_ts with desc or log_ts < i_ts with asc
queries, using if/else.

That is is.

> But there will be a situation where all data will need to be analyzed for, let's call it', a report. On the other
handsthose reports are not that sensitive when it comes to performance. It won't be an issue if it takes a minute, two
orthree to generate it. 

Then you can use simple queries like this one here.

select count(1) from log_tag
where
    log_user_id = 987 and
    tag_id in (123, 456) and
    log_ts between now() and now() - '1 week'::interval

It will be okay for kind of reports in most cases.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: DB design advice

От

Toni Alfirević

Дата:

11 июня 2013 г., 09:49:01

On 8. 6. 2013., at 09:17, Sergey Konoplev <gray.ru@gmail.com> wrote:

> On Thu, Jun 6, 2013 at 11:43 PM, Toni Alfirević
> <toni.alfirevic@gmail.com> wrote:
>> It will most likely return subsets of data just like you said (pagination).
>
> Okay. What do we have?
>
> Table log (id bigserial primary key, ts timestamptz, value text,
> user_id bigint) with (user_id) foreign key .
>
> Table tag (id bigserial primary key, value text).
>
> Table log_tag (log_id bigint, log_ts timestamptz, log_user_id bigint,
> tag_id bigint) with foreign keys (log_id, log_ts, log_user_id) and
> (tag_id).
>
> Index log_tag_pk (log_user_id, tag_id, log_ts, log_id) on log_tag.
>
> It it is not so critical, you can make it without foreign keys to gain
> some more performance of course, but I wont recommend it on early
> stages of your software until it is mature and covered by tests good
> enough, so you could believe that it wont break your data integrity.
> You can always do this later.
>
> We also have a plpgsql stored function get_log_by_tags(i_log_user_id
> bigint, i_tag_ids bigint[], i_ts timestamptz, i_limit) returning set
> of log.
>
> This function initiates an empty resulting array _result_logs log[].
>
> Then it takes each _tag_id from i_tag_igs with in a for loop.
>
> And selects all log entries for i_log_user_id and _tag_id inserted
> after or before i_ts, depending on the order you want to output your
> log records by, and limiting it by i_limit. Then it concatenates these
> entries with the resulting array, orders the result by ts and limits
> it with i_limit.
>
> This query is supposed to be in the for loop and represent the
> paragraph above. I will use descending order as an example.
>
> select into _result_logs array(
>    select ll from (
>        (
>            select log.* from log_tag join log on log.id = log_id
>            where log_user_id = i_log_user_id
>            and tag_id = _tag_id and log_ts > i_ts
>            order by log_ts desc limit i_limit
>        ) union (
>            select (l::log).* from unnest(_result_logs) as l
>        )
>    ) as ll
>    order by ts limit i_limit
> )
>
> I did not check this query so it might contains some typos, etc.
>
> Finally it returns the result of unnesting _result_logs.
>
> select (l::log).* from unnest(_result_logs) as l
>
> If you need to be able to go back and forth you can send, for example,
> add an i_back boolean parameter to the function, and according to its
> value choose either log_ts > i_ts with desc or log_ts < i_ts with asc
> queries, using if/else.
>
> That is is.
>
>> But there will be a situation where all data will need to be analyzed for, let's call it', a report. On the other
handsthose reports are not that sensitive when it comes to performance. It won't be an issue if it takes a minute, two
orthree to generate it. 
>
> Then you can use simple queries like this one here.
>
> select count(1) from log_tag
> where
>    log_user_id = 987 and
>    tag_id in (123, 456) and
>    log_ts between now() and now() - '1 week'::interval
>
> It will be okay for kind of reports in most cases.

Sergey, you've been more then helpfull. Thank you.

I'll take all of this into consideration.

At one point I'll give you beta access so you can check out what you helped to create ;)

Cheers!
T.

Re: DB design advice

От

Sergey Konoplev

Дата:

11 июня 2013 г., 19:48:30

On Tue, Jun 11, 2013 at 2:48 AM, Toni Alfirević
<toni.alfirevic@gmail.com> wrote:
> Sergey, you've been more then helpfull. Thank you.
> I'll take all of this into consideration.

Nice to know it.

If you want, I could check if everything is optimal when you finish
all the DDL and stored function(s) for this part of schema.

> At one point I'll give you beta access so you can check out what you helped to create ;)

Thank you :)

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: DB design advice

DB design advice

Re: DB design advice

Re: DB design advice

Re: DB design advice

Re: DB design advice

Re: DB design advice