Обсуждение: sql (Stored procedure) design question

Поиск
Список
Период
Сортировка

sql (Stored procedure) design question

От
Assad Jarrahian
Дата:
Problem:
  A column of type text contains username's and groupname's followed
by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way
because it will be displayed in this format].

What is needed is a constrain check. To make sure that every time a
row is inserted into the table, this column (text) gets checked to
make sure that all the username/group name in the text exists in the
db (they are primary key columns of two tables).

I would like to use pgsql to write a stored procedure, but I dont know
how to split the text field up and then send them to do a constrain
check with the other tables.

I initially get this in java and then use jdbc, so if there is a
better way you can think of, feel free to comment.

Any help would be much appreciated. Thanks.


-assad

Re: sql (Stored procedure) design question

От
Tom Lane
Дата:
Assad Jarrahian <jarraa@gmail.com> writes:
>   A column of type text contains username's and groupname's followed
> by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way
> because it will be displayed in this format].

You should *not* allow display concerns to drive your database layout.
The pain you are now experiencing is just the first taste of why that's
a bad idea.

Store the usernames and groupnames separately.  If you have an
application that's too dumb to concatenate them for itself, you
can make a view on the table that provides the display representation
the application needs.

            regards, tom lane

Re: sql (Stored procedure) design question

От
Assad Jarrahian
Дата:
what drove me to store it that way was more of a performance issue.

So if I store a documentID and then have a seperate table names, to_field
I will have to store the username/groupname as such
to_field:
documentID username
docuementID groupname
docuementID usernamename

...

based on that it seems that to read (Which occurs a lot more than
writing) a document I will have to go an find all the rows with
documentID in the to field and then cat them somehow and then return
the whole document. Even with an index built on the documentID of the
to field, it seems to be resource intensive (correct me if I am
wrong). Even if I were to do this, I would still have to check the
constraints by hand (since the entry can either be a username or a
groupname (and if its a group, the user has to belong to that group).


So coming back to the original synopsis. Is there a way I can just
send the whole document to a stored procedure and then the stored
procedure takes the text component of it and tokenizes it and then I
do constraint checks on it?

thanks.
-assad





On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Assad Jarrahian <jarraa@gmail.com> writes:
> >   A column of type text contains username's and groupname's followed
> > by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way
> > because it will be displayed in this format].
>
> You should *not* allow display concerns to drive your database layout.
> The pain you are now experiencing is just the first taste of why that's
> a bad idea.
>
> Store the usernames and groupnames separately.  If you have an
> application that's too dumb to concatenate them for itself, you
> can make a view on the table that provides the display representation
> the application needs.
>
>                         regards, tom lane
>

Re: sql (Stored procedure) design question

От
Tino Wildenhain
Дата:
Assad Jarrahian schrieb:
> what drove me to store it that way was more of a performance issue.
>
> So if I store a documentID and then have a seperate table names, to_field
> I will have to store the username/groupname as such
> to_field:
> documentID username
> docuementID groupname
> docuementID usernamename
>
From what I see below (btw, try to follow the way people cite on
mailinglists)

Your schema could rather look like this:

documentid,username,groupname  (as real fields)

>
> based on that it seems that to read (Which occurs a lot more than
> writing) a document I will have to go an find all the rows with
> documentID in the to field and then cat them somehow and then return

cat them? Why? (There is text concenation btw)

> the whole document. Even with an index built on the documentID of the
> to field, it seems to be resource intensive (correct me if I am
> wrong). Even if I were to do this, I would still have to check the
> constraints by hand (since the entry can either be a username or a
> groupname (and if its a group, the user has to belong to that group).

Well no, as above, you would have the usual foreign keys.
>
> So coming back to the original synopsis. Is there a way I can just
> send the whole document to a stored procedure and then the stored
> procedure takes the text component of it and tokenizes it and then I
> do constraint checks on it?

There is ;) But I doubt Tom likes to show you the dark side [tm] ;)


...
>
>
> On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>Assad Jarrahian <jarraa@gmail.com> writes:
>>
>>>  A column of type text contains username's and groupname's followed
>>>by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way
>>>because it will be displayed in this format].

++Tino

Re: sql (Stored procedure) design question

От
Assad Jarrahian
Дата:
Tino, thanks for your response
>
> Your schema could rather look like this:
>
> documentid,username,groupname  (as real fields)
>

Okay, so a typical document can be addressed to any number of users/groups.  so
according to an example with the to field = 'jarraa, postgres, keith',
some rows could be like

(1, jarraa, )
(1,  , postgres)
(1, keith, )

so I am not sure if I can make foreign keys constraints here (since
the username or group can be blank). Additionally the to field can be
to any user, but can only be to a group that the user has permissions
to write to (aka subscribed).

> >
> > based on that it seems that to read (Which occurs a lot more than
> > writing) a document I will have to go an find all the rows with
> > documentID in the to field and then cat them somehow and then return
>
> cat them? Why? (There is text concenation btw)

cat them because originally I wanted really fast retrieval (with the
method you describe, I will have to go to two tables), so I would
store it in the order it was to be shipped out (much like an email to
field when you recieve it)

Where can I find this text concatenation stuff?


>

> There is ;) But I doubt Tom likes to show you the dark side [tm] ;)

hehe

maybe I can do a hybrid, storing the string completely in the doc
table and storing all the to fields as a document_to table.

Awaiting your thoughts.
thanks!

-Assad

>
>
> ...
> >
> >
> > On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >>Assad Jarrahian <jarraa@gmail.com> writes:
> >>
> >>>  A column of type text contains username's and groupname's followed
> >>>by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way
> >>>because it will be displayed in this format].
>
> ++Tino
>

Re: sql (Stored procedure) design question

От
Alban Hertroys
Дата:
Assad Jarrahian wrote:
> what drove me to store it that way was more of a performance issue.
>
> So if I store a documentID and then have a seperate table names, to_field

Why a seperate table? From what you showed us you don't seem to need
that. Just use 3 columns for the seperate entries instead of 1 that
combines them. There's really no big deal to it. And you gain the
benefit of being able to put foreign key constraints on those columns.

Now you need to split a string, risking splitting it the wrong way (if
there happen to be comma's in names) and using some of that
computational power you seem to be so concerned about. It seems unlikely
you would gain any measurable performance this way, if at all.

I can't say I fully understand what you're trying to do, your example
and explanations were somewhat confusing.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

//Showing your Vision to the World//

Re: sql (Stored procedure) design question

От
Tino Wildenhain
Дата:
Assad Jarrahian schrieb:
> Tino, thanks for your response
>
>>Your schema could rather look like this:
>>
>>documentid,username,groupname  (as real fields)
>>
>
>
> Okay, so a typical document can be addressed to any number of users/groups.  so
> according to an example with the to field = 'jarraa, postgres, keith',
> some rows could be like
>
> (1, jarraa, )
> (1,  , postgres)
> (1, keith, )
>

No, it would look like this:

(1,'jarraa','postgres','keith') to match your original schema.

But it really depends on what you want to solve here.
What data exactly do you have and how do you want to look
for data? I mean, what are the keys?

Do you have a specific "group" and want to see all "docs" for
that group? And then you have a user and want to see all "docs"
for this user? Or is it the other way round and you have
a "doc" and want to see responsible users and/or groups?
And are groups independend from users? Or are users always
in one or many groups? ...


Re: sql (Stored procedure) design question

От
Assad Jarrahian
Дата:
Alban and Tino,
 More explanations lie below.
A document contains to field, from field, subject, message, etc

> >>documentid,username,groupname  (as real fields)
> No, it would look like this:
>
> (1,'jarraa','postgres','keith') to match your original schema.

I dont understand it. what type would it be (the second column) and
what constraint would it have.
And how can I get away with storing just one table. According to my
thougths, I seem to need two tables for certain (could be wrong).
Schemas as follows:

(documentID, From,subject, message ....)
(doucumentID, OneOfTheManyToFieldReciepient)


Keep in mind that I dont know how many entries are in the To field
(its just one string coming from the client!). IT would be helpful to
think of this as just like an email system (but a closed one, as in
you have everybody's username in the db). Furhtermore think of social
networks (thats what I am building).


>
> But it really depends on what you want to solve here.
> What data exactly do you have and how do you want to look
> for data? I mean, what are the keys?
>
I rarely analyze data I recieve (besides for constraints). We have
lucene (java search engine) indexing the db contents and almost all
*SELECT*  queries goto that first and then I get an ID list (since
lucene does not store the data) and I retrieve the information in the
format as the same way it came in.
Lucene, being in java can split up tokens fairly easily and quickly
and our Lucene engineer likes it in the format of string '  name,
name, name'

> Do you have a specific "group" and want to see all "docs" for
> that group? And then you have a user and want to see all "docs"
> for this user? Or is it the other way round and you have
> a "doc" and want to see responsible users and/or groups?
see above explanation of lucene.

> And are groups independend from users? Or are users always
> in one or many groups? ...

A user can belong to many groups. A user can write a "document" To
users, and To groups. The constrainfo user being the user is in the db
and for groups, he/she has to be a member of it.

Users Schema
(username, userInfo)
Group Schema (2 tables)
(groupID, groupname, groupInfo)
(groupID, oneOfTheManyGroupMembers)

Hope this helps and that I am making sense now. Sorry for the confusion.

-assad

On 1/12/06, Tino Wildenhain <tino@wildenhain.de> wrote:
> Assad Jarrahian schrieb:
> > Tino, thanks for your response
> >
> >>Your schema could rather look like this:
> >>
> >>documentid,username,groupname  (as real fields)
> >>
> >
> >
> > Okay, so a typical document can be addressed to any number of users/groups.  so
> > according to an example with the to field = 'jarraa, postgres, keith',
> > some rows could be like
> >
> > (1, jarraa, )
> > (1,  , postgres)
> > (1, keith, )
> >
>
> No, it would look like this:
>
> (1,'jarraa','postgres','keith') to match your original schema.
>
> But it really depends on what you want to solve here.
> What data exactly do you have and how do you want to look
> for data? I mean, what are the keys?
>
> Do you have a specific "group" and want to see all "docs" for
> that group? And then you have a user and want to see all "docs"
> for this user? Or is it the other way round and you have
> a "doc" and want to see responsible users and/or groups?
> And are groups independend from users? Or are users always
> in one or many groups? ...
>
>

Re: sql (Stored procedure) design question

От
Assad Jarrahian
Дата:
So is it still advisable not to store it in a string format like
'jarraa, mooreg3, flowerpower' [read my comments below please]

awaiting some feedback.
-assad


On 1/12/06, Assad Jarrahian <jarraa@gmail.com> wrote:
> Alban and Tino,
>  More explanations lie below.
> A document contains to field, from field, subject, message, etc
>
> > >>documentid,username,groupname  (as real fields)
> > No, it would look like this:
> >
> > (1,'jarraa','postgres','keith') to match your original schema.
>
> I dont understand it. what type would it be (the second column) and
> what constraint would it have.
> And how can I get away with storing just one table. According to my
> thougths, I seem to need two tables for certain (could be wrong).
> Schemas as follows:
>
> (documentID, From,subject, message ....)
> (doucumentID, OneOfTheManyToFieldReciepient)
>
>
> Keep in mind that I dont know how many entries are in the To field
> (its just one string coming from the client!). IT would be helpful to
> think of this as just like an email system (but a closed one, as in
> you have everybody's username in the db). Furhtermore think of social
> networks (thats what I am building).
>
>
> >
> > But it really depends on what you want to solve here.
> > What data exactly do you have and how do you want to look
> > for data? I mean, what are the keys?
> >
> I rarely analyze data I recieve (besides for constraints). We have
> lucene (java search engine) indexing the db contents and almost all
> *SELECT*  queries goto that first and then I get an ID list (since
> lucene does not store the data) and I retrieve the information in the
> format as the same way it came in.
> Lucene, being in java can split up tokens fairly easily and quickly
> and our Lucene engineer likes it in the format of string '  name,
> name, name'
>
> > Do you have a specific "group" and want to see all "docs" for
> > that group? And then you have a user and want to see all "docs"
> > for this user? Or is it the other way round and you have
> > a "doc" and want to see responsible users and/or groups?
> see above explanation of lucene.
>
> > And are groups independend from users? Or are users always
> > in one or many groups? ...
>
> A user can belong to many groups. A user can write a "document" To
> users, and To groups. The constrainfo user being the user is in the db
> and for groups, he/she has to be a member of it.
>
> Users Schema
> (username, userInfo)
> Group Schema (2 tables)
> (groupID, groupname, groupInfo)
> (groupID, oneOfTheManyGroupMembers)
>
> Hope this helps and that I am making sense now. Sorry for the confusion.
>
> -assad
>
> On 1/12/06, Tino Wildenhain <tino@wildenhain.de> wrote:
> > Assad Jarrahian schrieb:
> > > Tino, thanks for your response
> > >
> > >>Your schema could rather look like this:
> > >>
> > >>documentid,username,groupname  (as real fields)
> > >>
> > >
> > >
> > > Okay, so a typical document can be addressed to any number of users/groups.  so
> > > according to an example with the to field = 'jarraa, postgres, keith',
> > > some rows could be like
> > >
> > > (1, jarraa, )
> > > (1,  , postgres)
> > > (1, keith, )
> > >
> >
> > No, it would look like this:
> >
> > (1,'jarraa','postgres','keith') to match your original schema.
> >
> > But it really depends on what you want to solve here.
> > What data exactly do you have and how do you want to look
> > for data? I mean, what are the keys?
> >
> > Do you have a specific "group" and want to see all "docs" for
> > that group? And then you have a user and want to see all "docs"
> > for this user? Or is it the other way round and you have
> > a "doc" and want to see responsible users and/or groups?
> > And are groups independend from users? Or are users always
> > in one or many groups? ...
> >
> >
>