Обсуждение: paradigm sanity check needed [long]

Поиск
Список
Период
Сортировка

paradigm sanity check needed [long]

От
will trillich
Дата:
i'd like to run our concept up the flagpole and see what kind of
shots go plumb through it--

you know how the standard recommendation is to have every field
across the database with its own unique name? right?

like school_student_certificate.school_student_certificate_year,
long_enough_to_choke_a_horse.long_enough_to_choke_a_horse_thing...

well, we're not doing that. not even close. all the literature
says it's a good idea, but they don't say why. and we want to
find out. :)

here goes--

EVERY table of ours has
    *.id       serial primary key,
    *.created  date default current_date,
    *.modified timestamp(0),
    *.by       references person(id),

and when one table links to another (aside from the "edited by
field above, that is) the name of the foreign key matches that of
the table whose id we're after:

    team.person      => person.id
    team.org         => org.id
    dates.date_types => date_types.id
    loc.state        => state.id
    loc.nation       => nation.id
    project.org      => org.id
    ...etc...

programmatically -- at first glance -- this looks like a DREAM to
manage. (this is how we know there must be something seriously
wrong with it. red alert!)

using this setup, we can easily link up, using sql, which tables
refer (defer) to others -- detecting a parent/child relationship
is a snap:

    -- here's the main "subroutine" used below:
    CREATE VIEW sys_tables as
    SELECT
        c.relname   AS class,
        oid         AS c_oid
    FROM
        pg_class c
    WHERE
        c.relname !~ '^pg_' -- not a postgres system table
        AND
        c.relname !~ '^_' -- not a table-behind-a-view
        AND
        c.relkind IN ('v','r') -- view or relation/table
    ORDER BY
        c.relname
    ;

    COMMENT ON COLUMN sys_tables.class is 'System Table (class) name';



then once we specify our lookup/validation table pattern, we can
figure out the rest:
    -- linking tables (for many-to-many)
    -- subtables (many-to-one)
    -- data tables (the nacells of the enterprise)

for the validation/lookup tables, they're all named *_TYPE -- and
then there's the other two: "state" and "nation"; we have some
special-case hard-coding for those two:

    -- VALIDATION TABLES (having *_TYPE names)
    CREATE VIEW sys_lookup_tables as
    SELECT
        t.class
    FROM
        sys_tables t
    WHERE
        (
            t.class ~ '_type$'
        ) OR (
            -- don't forget these!
            t.class IN ('state','nation') -- special cases
        )
    ORDER BY
        t.class
    ;

    COMMENT ON COLUMN sys_lookup_tables.class is 'Name of verification/lookup table';

if we eventually add others, this is the ONE place we'd have to
mention them. which normal form is THAT? :)



this next view shows ALL subsidiary relationships, both subtables
and linking (many-to-many) tables:

    -- SUB TABLES (TABLES having field names matching other table names)
    CREATE VIEW sys_sub_tables as
    SELECT
        s.class as sub,
        p.class as parent
    FROM
        pg_attribute a,
        sys_tables s,
        sys_tables p
    WHERE
        a.attrelid=s.c_oid -- field from Sub
        AND
        a.attname=p.class -- matches table Parent
    ORDER BY
        s.class,
        p.class
    ;

    COMMENT ON COLUMN sys_sub_tables.sub    is 'Subsidiary table name';
    COMMENT ON COLUMN sys_sub_tables.parent is 'Parent table name';



and here, we find the linking tables:

    CREATE VIEW sys_link_tables as
    SELECT
        s.sub as class
    FROM
        sys_sub_tables s
            left join
        sys_lookup_tables l
            on (l.class = s.parent)
    WHERE -- they're subtables, but not lookup tables
        l.class is null
    GROUP BY
        s.sub
    HAVING
        count(*) > 1
    ;

    COMMENT ON COLUMN sys_link_tables.class is 'Name of linking (many-to-many) table';

(explain select * from sys_link_tables -- quite amusing!)



pulling field descriptors out (comment) is pretty easy, too.
(internationalization is a long way off, of course...)

what do y'all think? "man the lifeboats"? should we be expecting
sea water in the scuppers anytime soon? and why (or why not)...

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: paradigm sanity check needed [long]

От
"Ayhan Ulusoy"
Дата:
Hi,

I played around with a similar “paradigm shift” for some time (except
for the lookup tables, which I prefer to keep seperate).

It’s true that it would be ALICE IN WONDERLAND -- only if it were
practical.

what we hit against is SQL syntax... (just syntax, not conceptual clash)

Now, SQL  lets you spell out your fields when you use them,
such as : person.id, person.created, ...  That’s a good thing.

BUT, the column names that are OUTPUT from a SELECT don’t have the table
name prefixed.
Which can be a good or a bad thing, depending...

Consider a query like this:
SELECT * FROM person;

The column names you will get out of this will not have the table name
prefixed.
Even the following won’t work as you sometimes wished it did:

SELECT person.id, person.created FROM person;

It is on the otherhand possible to use “AS” with each and every column
name :
SELECT person.id AS person.id , person.created AS person.created FROM
person;

That should have the prefixes in... What a waste of finger energy though
...
Besides, it won’t ever work with the “*”  (yeah I know they say “*” is a
bad thing, but I don’t think it is in all situations).

When you have JOINS, it’s even worse...

What is really needed is an other SELECT option (which could also be
just an extension to “AS”) that will cause the output to have the origin
table name prefixed.

Maybe something like :
SELECT person.* AS person.* FROM person;

Or maybe better...
SELECT * QUALIFIED  FROM person;

which could then be :
SELECT person.* QUALIFIED, department.* QUALIFIED FROM person,
department
       WHERE person.department = department.id;


OR may be just a GLOBAL option for the entire SELECT, such as :

SELECT QUALIFIED person.*, department.* FROM person, department
       WHERE person.department = department.id;

Further name qualification that exteds to the SCHEMA and DATABASE (why
not UNIVERSE?  --:) could be
opted with :
       SELECT [TABLE|SCHEMA|DATABASE] QUALIFIED * FROM person;

Anyway, I am sure someone else could come up with a better syntax, but
you get the idea ...

Cheers,
Ayhan




-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of will trillich
Sent: samedi 1 février 2003 05:28
To: pgsql-general@postgresql.org
Subject: [GENERAL] paradigm sanity check needed [long]

i'd like to run our concept up the flagpole and see what kind of
shots go plumb through it--

you know how the standard recommendation is to have every field
across the database with its own unique name? right?

like school_student_certificate.school_student_certificate_year,
long_enough_to_choke_a_horse.long_enough_to_choke_a_horse_thing...

well, we're not doing that. not even close. all the literature
says it's a good idea, but they don't say why. and we want to
find out. :)

here goes--

EVERY table of ours has
    *.id       serial primary key,
    *.created  date default current_date,
    *.modified timestamp(0),
    *.by       references person(id),

and when one table links to another (aside from the "edited by
field above, that is) the name of the foreign key matches that of
the table whose id we're after:

    team.person      => person.id
    team.org         => org.id
    dates.date_types => date_types.id
    loc.state        => state.id
    loc.nation       => nation.id
    project.org      => org.id
    ...etc...

programmatically -- at first glance -- this looks like a DREAM to
manage. (this is how we know there must be something seriously
wrong with it. red alert!)

using this setup, we can easily link up, using sql, which tables
refer (defer) to others -- detecting a parent/child relationship
is a snap:

    -- here's the main "subroutine" used below:
    CREATE VIEW sys_tables as
    SELECT
        c.relname   AS class,
        oid         AS c_oid
    FROM
        pg_class c
    WHERE
        c.relname !~ '^pg_' -- not a postgres system table
        AND
        c.relname !~ '^_' -- not a table-behind-a-view
        AND
        c.relkind IN ('v','r') -- view or relation/table
    ORDER BY
        c.relname
    ;

    COMMENT ON COLUMN sys_tables.class is 'System Table (class)
name';



then once we specify our lookup/validation table pattern, we can
figure out the rest:
    -- linking tables (for many-to-many)
    -- subtables (many-to-one)
    -- data tables (the nacells of the enterprise)

for the validation/lookup tables, they're all named *_TYPE -- and
then there's the other two: "state" and "nation"; we have some
special-case hard-coding for those two:

    -- VALIDATION TABLES (having *_TYPE names)
    CREATE VIEW sys_lookup_tables as
    SELECT
        t.class
    FROM
        sys_tables t
    WHERE
        (
            t.class ~ '_type$'
        ) OR (
            -- don't forget these!
            t.class IN ('state','nation') -- special cases
        )
    ORDER BY
        t.class
    ;

    COMMENT ON COLUMN sys_lookup_tables.class is 'Name of
verification/lookup table';

if we eventually add others, this is the ONE place we'd have to
mention them. which normal form is THAT? :)



this next view shows ALL subsidiary relationships, both subtables
and linking (many-to-many) tables:

    -- SUB TABLES (TABLES having field names matching other table
names)
    CREATE VIEW sys_sub_tables as
    SELECT
        s.class as sub,
        p.class as parent
    FROM
        pg_attribute a,
        sys_tables s,
        sys_tables p
    WHERE
        a.attrelid=s.c_oid -- field from Sub
        AND
        a.attname=p.class -- matches table Parent
    ORDER BY
        s.class,
        p.class
    ;

    COMMENT ON COLUMN sys_sub_tables.sub    is 'Subsidiary table
name';
    COMMENT ON COLUMN sys_sub_tables.parent is 'Parent table name';



and here, we find the linking tables:

    CREATE VIEW sys_link_tables as
    SELECT
        s.sub as class
    FROM
        sys_sub_tables s
            left join
        sys_lookup_tables l
            on (l.class = s.parent)
    WHERE -- they're subtables, but not lookup tables
        l.class is null
    GROUP BY
        s.sub
    HAVING
        count(*) > 1
    ;

    COMMENT ON COLUMN sys_link_tables.class is 'Name of linking
(many-to-many) table';

(explain select * from sys_link_tables -- quite amusing!)



pulling field descriptors out (comment) is pretty easy, too.
(internationalization is a long way off, of course...)

what do y'all think? "man the lifeboats"? should we be expecting
sea water in the scuppers anytime soon? and why (or why not)...

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: paradigm sanity check needed [long]

От
will trillich
Дата:
discussion on using this field-naming mechanism:

    TOPTABLE:        UNDERTABLE:
       id -- pkey       id -- pkey
       data             data
       otherdata        DATTABLE -- points to toptable.id

instead of the usual undertable.undertable_id and
undertable.toptable_id --

On Sat, Feb 01, 2003 at 04:40:39PM +0100, Ayhan Ulusoy wrote:
> I played around with a similar “paradigm shift” for some
> time (except for the lookup tables, which I prefer to keep
> seperate).
>
> It’s true that it would be ALICE IN WONDERLAND -- only if it
> were practical.
>
> what we hit against is SQL syntax... (just syntax, not
> conceptual clash)

hmm. show me what you mean--

> Now, SQL  lets you spell out your fields when you use them,
> such as : person.id, person.created, ...  That’s a good
> thing.
>
> BUT, the column names that are OUTPUT from a SELECT don’t
> have the table name prefixed.  Which can be a good or a bad
> thing, depending...
>
> Consider a query like this:
> SELECT * FROM person;
>
> The column names you will get out of this will not have the
> table name prefixed.

i don't yet see the evil there. YOU specified which table, so you
ALREADY KNOW what the table is.

> Even the following won’t work as you sometimes wished it did:
>
> SELECT person.id, person.created FROM person;
>
> It is on the otherhand possible to use “AS” with each and
> every column name :
>
> SELECT person.id AS person.id , person.created AS
> person.created FROM person;
>
> That should have the prefixes in... What a waste of finger
> energy though ...

why is it crucial to have the table.* prefix if you're selecting
from one table?

> When you have JOINS, it’s even worse...

so your objection is primarily in the joins, then:

    select
        project.id, org.id -- error: can't result in two "id" fields!
    from
        project,
        org
    where
        project.org = org.id
    ;

hmm, yes. badness that way. i can see doing

    select
        -- other fields,
        project.id AS PROJECT_ID,
        org.id     AS ORG_ID
        -- "wasted^H^H^H^H^H^Hinvested finger effort"
    from
        project,
        org
    where
        project.org = org.id
    ;

to get a list of ALL the projects and all their related
enterprises; then we just split/_/ to get table.field back.

but more commonly we will want projects for a certain
institution, so we'll already know the org.id--

    select
        -- other fields,
        project.id
    from
        project,
        org
    where
        project.org = org.id
    AND
        ORG.ID = $THIS_VALUE
    ;

here since we know which org we're after, we might only be
interested in the project id's related to it. is there
other nefariousity lurking in there somewhere?

i can also see creating views for each "these_from_those"
relation which would keep that part of the logic out of the
middle-layer of the application:

    create view project_from_org as
    select
        p.*
    from
        project p,
        org     o
    where
        o.id = p.org
    ;
    select * from project_from_org where org=$some_org_id;

and if we somehow forget which org this was, we can look in the
resulting "org" field pulled from the project table.

===

the drawback i see in using

    ORG                PROJECT
       org_id             project_id
       org_data           org_id

is that you're just about going to have to hard-code every
crosslink (we *know* project.org_id links to org.org_id) or do a
bunch of split/_/ anyhow (when fieldname ends "_id" find table
mentioned in before underscore then refer to its field of the
same name). no?

in some ways it's six-of-one-and-a-half-dozeon-of-the-other.
but in other ways the "tablename.linkedtable_id" approach seems a
hair more tedious than "tablename.linkedtable [as id]" one.

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: paradigm sanity check needed [long]

От
"Matthew V." <
Дата:
On Fri, 31 Jan 2003 22:30:45 -0600, will trillich wrote:

> i'd like to run our concept up the flagpole and see what kind of shots
> go plumb through it--
>
> you know how the standard recommendation is to have every field across
> the database with its own unique name? right?
>
> like school_student_certificate.school_student_certificate_year,
> long_enough_to_choke_a_horse.long_enough_to_choke_a_horse_thing...
>
> well, we're not doing that. not even close. all the literature says it's
> a good idea, but they don't say why. and we want to find out. :)
>
> here goes--
>
> EVERY table of ours has
>     *.id       serial primary key,
>     *.created  date default current_date, *.modified timestamp(0), *.by
>       references person(id),
>
> and when one table links to another (aside from the "edited by field
> above, that is) the name of the foreign key matches that of the table
> whose id we're after:
>
>     team.person      => person.id
>     team.org         => org.id
>     dates.date_types => date_types.id
>     loc.state        => state.id
>     loc.nation       => nation.id
>     project.org      => org.id
>     ...etc...
>
>
It's pretty much up to you and your organization.  What's most
maintainable for you?  Theory only goes so far--that's why we work so hard
to normalize relations only to struggle to find ways of de-normalizing to
views.  Programatically, logically, using your naming scheme is just that:
a naming scheme.  I personally like it, and I really *don't* like typing
60-character names (or even 30, with an alias, or whatever).

What you name your tables and columns should represent the data
accurately.  That way, when you hire a consultant, you don't have to pay
him for the time he (or she) takes figuring out an arcane and
non-representative naming scheme.  And your successors will bless your
name for intelligently naming you fields. :)

As far as the data is concerned, Postgresql doesn't care what you call it.
 I think as long as you avoid keywords you'll be alright.

Just my opinion, I could be wrong. ;)

--
Matthew Vanecek
perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
********************************************************************************
For 93 million miles, there is nothing between the sun and my shadow except me.
I'm always getting in the way of something...

Re: paradigm sanity check needed [long]

От
will trillich
Дата:
On Fri, Jan 31, 2003 at 11:33:35PM -0600, pgsql-general-owner+M37147=will=serensoft.com@postgresql.org wrote:
> On Fri, 31 Jan 2003 22:30:45 -0600, will trillich wrote:
> What you name your tables and columns should represent the data
> accurately.  That way, when you hire a consultant, you don't have to pay
> him for the time he (or she) takes figuring out an arcane and
> non-representative naming scheme.  And your successors will bless your
> name for intelligently naming you fields. :)
>
> As far as the data is concerned, Postgresql doesn't care what you call it.
>  I think as long as you avoid keywords you'll be alright.

true.

you know how many time i wish i could use a "date" field? or
"on"? or "table"? :( (without having to use the blasted quotes,
of course...)

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !