Обсуждение: schema design question

Поиск
Список
Период
Сортировка

schema design question

От
"mark overmeer"
Дата:
Hi,

Maybe not completely the wright place to ask but... I have this schema design question (db is postgres of course). I have a couple of classes with attributes. The only goal is to search the object that I want to find (which is stored on the harddrive).

I have hundreds of classes that are similar but not the same. They all have attributes/properties (type is probably String), e.g. (in pseudo code):

class A_version_1 {
   attribute1, attribute2, attribute3, ..., attributeN
}

class A_version_2 {
   attribute1, attribute3, ..., attributeN, attributeN+1, attributeN+2
}

class B_version_1 {
   attribute3, attribute4, attribute7, attributeN+3, ..., attributeN+M
}


Class A will have attributes from class B, class B will have attributes from class C and so on. My initial thought was to use the (sometimes dreaded) EAV model: class_id, object_id, attribute_id and attribute_value. In this way I can make queries like:

SELECT CLASS_ID,
OBJECT_ID
FROM EAV_TABLE EAV
WHERE EAV.ATTRIBUTE_ID = X
AND EAV.ATTRIBUTE_VALUE = 'searchstring'
AND EXISTS (SELECT OBJECT_ID
FROM EAV_TABLE EAV2
WHERE EAV.OBJECT_ID = EAV2.OBJECT_ID
AND EAV.CLASS_ID = EAV2.CLASS_ID
AND EAV2.ATTRIBUTE_ID = Y
AND EAV2.ATTRIBUTE_VALUE= 'searchstring2')
Results from this query could be entities from multiple classes!

The alternative is, as many people say: make a proper table for each class which would lead to hundreds of unions. Is that good/performant? I thought it would not...  To put all attributes of all classes (as columns) in one table is impossible. The number of total attributes  should be in the thousands.

A third alternative I came up with is the entity/value schema design where each attribute would have its own table. A query would look like this:

SELECT CLASS_ID,
OBJECT_ID
FROM EV_X EAV
WHERE EAV.ATTRIBUTE_VALUE = 'searchstring'
AND EXISTS (SELECT OBJECT_ID
FROM EV_Y EAV2
WHERE EAV.OBJECT_ID = EAV2.OBJECT_ID
AND EAV.CLASS_ID = EAV2.CLASS_ID
AND EAV2.ATTRIBUTE_VALUE = 'searchstring2')
Which would be a nice way to partition the otherwise large table (but there would be thousands of smaller tables).

The app I'm writing has to scale to about 1 billion attributes/value-pairs in total. A normal search query would imply about 5 search terms (but there could be 20). Any suggestions/remarks (I think the EXISTS should be replaced by an IN, something else)? Did anyone implement such a search method (or did they decide to make a different design)? Did it work/scale?

Thanks in advance,

Mark O.

Re: schema design question

От
Adam Tauno Williams
Дата:
> Maybe not completely the wright place to ask but... I have this schema
> design question (db is postgres of course). I have a couple of classes
> with attributes. The only goal is to search the object that I want to
> find (which is stored on the harddrive).
> I have hundreds of classes that are similar but not the same. They all
> have attributes/properties (type is probably String), e.g. (in pseudo
> code):

Use table inheritance.

--
Adam Tauno Williams, Network & Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


Re: schema design question

От
"mark overmeer"
Дата:
Hi Adam,

Thanks for the fast reply. What should inherit from what? Class A (e.g. 'todo item') is certainly not derived from property X (e.g. 'startdate'). Class A version 2 has different properties (some are removed, others are added). Can you elaborate / say I'm wrong / give an example ? Thanks,

Mark


2007/8/19, Adam Tauno Williams <adamtaunowilliams@gmail.com>:

> Maybe not completely the wright place to ask but... I have this schema
> design question (db is postgres of course). I have a couple of classes
> with attributes. The only goal is to search the object that I want to
> find (which is stored on the harddrive).
> I have hundreds of classes that are similar but not the same. They all
> have attributes/properties (type is probably String), e.g. (in pseudo
> code):

Use table inheritance.

--
Adam Tauno Williams, Network & Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: schema design question

От
David Fetter
Дата:
On Sun, Aug 19, 2007 at 03:19:52PM +0200, mark overmeer wrote:
>  Hi,
>
> Maybe not completely the wright place to ask but... I have this
> schema design question (db is postgres of course). I have a couple
> of classes with attributes.

Danger, Will Robinson!  Danger!

The DBMS way of looking at things is fundamentally different from OO
coding, and if you try to make them fit together naïvely as you do
below, you only get grief.

> The only goal is to search the object
> that I want to find (which is stored on the harddrive).
>
> I have hundreds of classes that are similar but not the same. They all have
> attributes/properties (type is probably String), e.g. (in pseudo code):
>
> class A_version_1 {
>    attribute1, attribute2, attribute3, ..., attributeN
> }
>
> class A_version_2 {
>    attribute1, attribute3, ..., attributeN, attributeN+1, attributeN+2
> }
>
> class B_version_1 {
>    attribute3, attribute4, attribute7, attributeN+3, ..., attributeN+M
> }
>
>
> Class A will have attributes from class B, class B will have
> attributes from class C and so on. My initial thought was to use the
> (sometimes dreaded) EAV model: class_id, object_id, attribute_id and
> attribute_value. In this way I can make queries like:
>
> SELECT CLASS_ID,
>        OBJECT_ID
> FROM   EAV_TABLE EAV

There's your mistake.  EAV is not performant, and won't become so.

Decide what your database will and won't do, and design your schema
around that.  I know it takes a little extra helping of courage, but
it's worth it in the long run.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: schema design question

От
"Steinar H. Gunderson"
Дата:
On Sun, Aug 19, 2007 at 11:12:16AM -0700, David Fetter wrote:
> There's your mistake.  EAV is not performant, and won't become so.

It sort of depends. I put all the EXIF information for my image gallery into
an EAV table -- it was the most logical format at the time, although I'm not
sure I need all the information. Anyhow, with clustering and indexes,
Postgres zips through the five million records easily enough for my use -- at
least fast enough that I can live with it without feeling the need for a
redesign.

As a general database design paradigm, though, I fully agree with you.
Databases are databases, not glorified OO data stores or hash tables.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: schema design question

От
David Fetter
Дата:
On Sun, Aug 19, 2007 at 08:26:58PM +0200, Steinar H. Gunderson wrote:
> On Sun, Aug 19, 2007 at 11:12:16AM -0700, David Fetter wrote:
> > There's your mistake.  EAV is not performant, and won't become so.
>
> It sort of depends. I put all the EXIF information for my image
> gallery into an EAV table -- it was the most logical format at the
> time, although I'm not sure I need all the information. Anyhow, with
> clustering and indexes, Postgres zips through the five million
> records easily enough for my use -- at least fast enough that I can
> live with it without feeling the need for a redesign.

Unless your records are huge, that's a tiny database, where tiny is
defined to mean that the whole thing fits in main memory with plenty
of room to spare.  I guarantee that performance will crash right
through the floor as soon as any table no longer fits in main memory.

> As a general database design paradigm, though, I fully agree with
> you.  Databases are databases, not glorified OO data stores or hash
> tables.

Exactly :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: schema design question

От
"mark overmeer"
Дата:
Hi,

2007/8/19, Steinar H. Gunderson <sgunderson@bigfoot.com>:
As a general database design paradigm, though, I fully agree with you.
Databases are databases, not glorified OO data stores or hash tables.
I don't want to use it as an OO data store, I use the filesystem for that. The intended use is to search for the right object. Since it has separate data structures for searching (indexes) I guess that is one of its functions.

However, it still doesn't answer my question about the EV model (where each attribute is given its own table).
 
Mark

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: schema design question

От
David Fetter
Дата:
On Sun, Aug 19, 2007 at 10:13:08PM +0200, mark overmeer wrote:
> Hi,
>
> 2007/8/19, Steinar H. Gunderson <sgunderson@bigfoot.com>:
> >
> > As a general database design paradigm, though, I fully agree with
> > you.  Databases are databases, not glorified OO data stores or
> > hash tables.
>
> I don't want to use it as an OO data store, I use the filesystem for
> that.  The intended use is to search for the right object. Since it
> has separate data structures for searching (indexes) I guess that is
> one of its functions.
>
> However, it still doesn't answer my question about the EV model
> (where each attribute is given its own table).

The answer to EAV modeling, is, "DON'T!"

Cheers,
David (who, if he were greedy, would be encouraging EAV modeling
because it would cause guaranteed large consulting income later)
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: schema design question

От
"Steinar H. Gunderson"
Дата:
On Sun, Aug 19, 2007 at 11:41:15AM -0700, David Fetter wrote:
> Unless your records are huge, that's a tiny database, where tiny is
> defined to mean that the whole thing fits in main memory with plenty
> of room to spare.  I guarantee that performance will crash right
> through the floor as soon as any table no longer fits in main memory.

Sure, it fits into memory; however, it isn't used so often, though, so it's
frequently not in the cache when it's needed. You are completely right in
that it's much slower from disk than from RAM :-)

The question is, of course, how to best store something like the EXIF
information _without_ using EAV. I could separate out the few fields I
normally use into a horizontal (ie. standard relational) table, but it seems
sort of... lossy? Another possible approach is to keep the EAV table around
for completeness in addition to the few fields I need, but then you do of
course get into normalization issues.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: schema design question

От
Adam Tauno Williams
Дата:

> However, it still doesn't answer my question about the EV model (where
> each attribute is given its own table).

Do a TABLE(object_id INT, attribute STRING, value STRING) if you just
want to be able to search for objects by an attribute.  But better yet
look at one of the thousand object persistence systems out there,  not
much to be gained from re-inventing the wheel.

--
Adam Tauno Williams, Network & Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org