Обсуждение: sql (Stored procedure) design question
Problem: A column of type text contains username's and groupname's followed by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way because it will be displayed in this format]. What is needed is a constrain check. To make sure that every time a row is inserted into the table, this column (text) gets checked to make sure that all the username/group name in the text exists in the db (they are primary key columns of two tables). I would like to use pgsql to write a stored procedure, but I dont know how to split the text field up and then send them to do a constrain check with the other tables. I initially get this in java and then use jdbc, so if there is a better way you can think of, feel free to comment. Any help would be much appreciated. Thanks. -assad
Assad Jarrahian <jarraa@gmail.com> writes: > A column of type text contains username's and groupname's followed > by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way > because it will be displayed in this format]. You should *not* allow display concerns to drive your database layout. The pain you are now experiencing is just the first taste of why that's a bad idea. Store the usernames and groupnames separately. If you have an application that's too dumb to concatenate them for itself, you can make a view on the table that provides the display representation the application needs. regards, tom lane
what drove me to store it that way was more of a performance issue. So if I store a documentID and then have a seperate table names, to_field I will have to store the username/groupname as such to_field: documentID username docuementID groupname docuementID usernamename ... based on that it seems that to read (Which occurs a lot more than writing) a document I will have to go an find all the rows with documentID in the to field and then cat them somehow and then return the whole document. Even with an index built on the documentID of the to field, it seems to be resource intensive (correct me if I am wrong). Even if I were to do this, I would still have to check the constraints by hand (since the entry can either be a username or a groupname (and if its a group, the user has to belong to that group). So coming back to the original synopsis. Is there a way I can just send the whole document to a stored procedure and then the stored procedure takes the text component of it and tokenizes it and then I do constraint checks on it? thanks. -assad On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Assad Jarrahian <jarraa@gmail.com> writes: > > A column of type text contains username's and groupname's followed > > by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way > > because it will be displayed in this format]. > > You should *not* allow display concerns to drive your database layout. > The pain you are now experiencing is just the first taste of why that's > a bad idea. > > Store the usernames and groupnames separately. If you have an > application that's too dumb to concatenate them for itself, you > can make a view on the table that provides the display representation > the application needs. > > regards, tom lane >
Assad Jarrahian schrieb: > what drove me to store it that way was more of a performance issue. > > So if I store a documentID and then have a seperate table names, to_field > I will have to store the username/groupname as such > to_field: > documentID username > docuementID groupname > docuementID usernamename > From what I see below (btw, try to follow the way people cite on mailinglists) Your schema could rather look like this: documentid,username,groupname (as real fields) > > based on that it seems that to read (Which occurs a lot more than > writing) a document I will have to go an find all the rows with > documentID in the to field and then cat them somehow and then return cat them? Why? (There is text concenation btw) > the whole document. Even with an index built on the documentID of the > to field, it seems to be resource intensive (correct me if I am > wrong). Even if I were to do this, I would still have to check the > constraints by hand (since the entry can either be a username or a > groupname (and if its a group, the user has to belong to that group). Well no, as above, you would have the usual foreign keys. > > So coming back to the original synopsis. Is there a way I can just > send the whole document to a stored procedure and then the stored > procedure takes the text component of it and tokenizes it and then I > do constraint checks on it? There is ;) But I doubt Tom likes to show you the dark side [tm] ;) ... > > > On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>Assad Jarrahian <jarraa@gmail.com> writes: >> >>> A column of type text contains username's and groupname's followed >>>by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way >>>because it will be displayed in this format]. ++Tino
Tino, thanks for your response > > Your schema could rather look like this: > > documentid,username,groupname (as real fields) > Okay, so a typical document can be addressed to any number of users/groups. so according to an example with the to field = 'jarraa, postgres, keith', some rows could be like (1, jarraa, ) (1, , postgres) (1, keith, ) so I am not sure if I can make foreign keys constraints here (since the username or group can be blank). Additionally the to field can be to any user, but can only be to a group that the user has permissions to write to (aka subscribed). > > > > based on that it seems that to read (Which occurs a lot more than > > writing) a document I will have to go an find all the rows with > > documentID in the to field and then cat them somehow and then return > > cat them? Why? (There is text concenation btw) cat them because originally I wanted really fast retrieval (with the method you describe, I will have to go to two tables), so I would store it in the order it was to be shipped out (much like an email to field when you recieve it) Where can I find this text concatenation stuff? > > There is ;) But I doubt Tom likes to show you the dark side [tm] ;) hehe maybe I can do a hybrid, storing the string completely in the doc table and storing all the to fields as a document_to table. Awaiting your thoughts. thanks! -Assad > > > ... > > > > > > On 1/11/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >>Assad Jarrahian <jarraa@gmail.com> writes: > >> > >>> A column of type text contains username's and groupname's followed > >>>by comma (eg. 'jarraa, mooreg3, keith') [it is stored this way > >>>because it will be displayed in this format]. > > ++Tino >
Assad Jarrahian wrote: > what drove me to store it that way was more of a performance issue. > > So if I store a documentID and then have a seperate table names, to_field Why a seperate table? From what you showed us you don't seem to need that. Just use 3 columns for the seperate entries instead of 1 that combines them. There's really no big deal to it. And you gain the benefit of being able to put foreign key constraints on those columns. Now you need to split a string, risking splitting it the wrong way (if there happen to be comma's in names) and using some of that computational power you seem to be so concerned about. It seems unlikely you would gain any measurable performance this way, if at all. I can't say I fully understand what you're trying to do, your example and explanations were somewhat confusing. Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede //Showing your Vision to the World//
Assad Jarrahian schrieb: > Tino, thanks for your response > >>Your schema could rather look like this: >> >>documentid,username,groupname (as real fields) >> > > > Okay, so a typical document can be addressed to any number of users/groups. so > according to an example with the to field = 'jarraa, postgres, keith', > some rows could be like > > (1, jarraa, ) > (1, , postgres) > (1, keith, ) > No, it would look like this: (1,'jarraa','postgres','keith') to match your original schema. But it really depends on what you want to solve here. What data exactly do you have and how do you want to look for data? I mean, what are the keys? Do you have a specific "group" and want to see all "docs" for that group? And then you have a user and want to see all "docs" for this user? Or is it the other way round and you have a "doc" and want to see responsible users and/or groups? And are groups independend from users? Or are users always in one or many groups? ...
Alban and Tino, More explanations lie below. A document contains to field, from field, subject, message, etc > >>documentid,username,groupname (as real fields) > No, it would look like this: > > (1,'jarraa','postgres','keith') to match your original schema. I dont understand it. what type would it be (the second column) and what constraint would it have. And how can I get away with storing just one table. According to my thougths, I seem to need two tables for certain (could be wrong). Schemas as follows: (documentID, From,subject, message ....) (doucumentID, OneOfTheManyToFieldReciepient) Keep in mind that I dont know how many entries are in the To field (its just one string coming from the client!). IT would be helpful to think of this as just like an email system (but a closed one, as in you have everybody's username in the db). Furhtermore think of social networks (thats what I am building). > > But it really depends on what you want to solve here. > What data exactly do you have and how do you want to look > for data? I mean, what are the keys? > I rarely analyze data I recieve (besides for constraints). We have lucene (java search engine) indexing the db contents and almost all *SELECT* queries goto that first and then I get an ID list (since lucene does not store the data) and I retrieve the information in the format as the same way it came in. Lucene, being in java can split up tokens fairly easily and quickly and our Lucene engineer likes it in the format of string ' name, name, name' > Do you have a specific "group" and want to see all "docs" for > that group? And then you have a user and want to see all "docs" > for this user? Or is it the other way round and you have > a "doc" and want to see responsible users and/or groups? see above explanation of lucene. > And are groups independend from users? Or are users always > in one or many groups? ... A user can belong to many groups. A user can write a "document" To users, and To groups. The constrainfo user being the user is in the db and for groups, he/she has to be a member of it. Users Schema (username, userInfo) Group Schema (2 tables) (groupID, groupname, groupInfo) (groupID, oneOfTheManyGroupMembers) Hope this helps and that I am making sense now. Sorry for the confusion. -assad On 1/12/06, Tino Wildenhain <tino@wildenhain.de> wrote: > Assad Jarrahian schrieb: > > Tino, thanks for your response > > > >>Your schema could rather look like this: > >> > >>documentid,username,groupname (as real fields) > >> > > > > > > Okay, so a typical document can be addressed to any number of users/groups. so > > according to an example with the to field = 'jarraa, postgres, keith', > > some rows could be like > > > > (1, jarraa, ) > > (1, , postgres) > > (1, keith, ) > > > > No, it would look like this: > > (1,'jarraa','postgres','keith') to match your original schema. > > But it really depends on what you want to solve here. > What data exactly do you have and how do you want to look > for data? I mean, what are the keys? > > Do you have a specific "group" and want to see all "docs" for > that group? And then you have a user and want to see all "docs" > for this user? Or is it the other way round and you have > a "doc" and want to see responsible users and/or groups? > And are groups independend from users? Or are users always > in one or many groups? ... > >
So is it still advisable not to store it in a string format like 'jarraa, mooreg3, flowerpower' [read my comments below please] awaiting some feedback. -assad On 1/12/06, Assad Jarrahian <jarraa@gmail.com> wrote: > Alban and Tino, > More explanations lie below. > A document contains to field, from field, subject, message, etc > > > >>documentid,username,groupname (as real fields) > > No, it would look like this: > > > > (1,'jarraa','postgres','keith') to match your original schema. > > I dont understand it. what type would it be (the second column) and > what constraint would it have. > And how can I get away with storing just one table. According to my > thougths, I seem to need two tables for certain (could be wrong). > Schemas as follows: > > (documentID, From,subject, message ....) > (doucumentID, OneOfTheManyToFieldReciepient) > > > Keep in mind that I dont know how many entries are in the To field > (its just one string coming from the client!). IT would be helpful to > think of this as just like an email system (but a closed one, as in > you have everybody's username in the db). Furhtermore think of social > networks (thats what I am building). > > > > > > But it really depends on what you want to solve here. > > What data exactly do you have and how do you want to look > > for data? I mean, what are the keys? > > > I rarely analyze data I recieve (besides for constraints). We have > lucene (java search engine) indexing the db contents and almost all > *SELECT* queries goto that first and then I get an ID list (since > lucene does not store the data) and I retrieve the information in the > format as the same way it came in. > Lucene, being in java can split up tokens fairly easily and quickly > and our Lucene engineer likes it in the format of string ' name, > name, name' > > > Do you have a specific "group" and want to see all "docs" for > > that group? And then you have a user and want to see all "docs" > > for this user? Or is it the other way round and you have > > a "doc" and want to see responsible users and/or groups? > see above explanation of lucene. > > > And are groups independend from users? Or are users always > > in one or many groups? ... > > A user can belong to many groups. A user can write a "document" To > users, and To groups. The constrainfo user being the user is in the db > and for groups, he/she has to be a member of it. > > Users Schema > (username, userInfo) > Group Schema (2 tables) > (groupID, groupname, groupInfo) > (groupID, oneOfTheManyGroupMembers) > > Hope this helps and that I am making sense now. Sorry for the confusion. > > -assad > > On 1/12/06, Tino Wildenhain <tino@wildenhain.de> wrote: > > Assad Jarrahian schrieb: > > > Tino, thanks for your response > > > > > >>Your schema could rather look like this: > > >> > > >>documentid,username,groupname (as real fields) > > >> > > > > > > > > > Okay, so a typical document can be addressed to any number of users/groups. so > > > according to an example with the to field = 'jarraa, postgres, keith', > > > some rows could be like > > > > > > (1, jarraa, ) > > > (1, , postgres) > > > (1, keith, ) > > > > > > > No, it would look like this: > > > > (1,'jarraa','postgres','keith') to match your original schema. > > > > But it really depends on what you want to solve here. > > What data exactly do you have and how do you want to look > > for data? I mean, what are the keys? > > > > Do you have a specific "group" and want to see all "docs" for > > that group? And then you have a user and want to see all "docs" > > for this user? Or is it the other way round and you have > > a "doc" and want to see responsible users and/or groups? > > And are groups independend from users? Or are users always > > in one or many groups? ... > > > > >