Обсуждение: Problem with UPDATE and UNIQUE

Поиск
Список
Период
Сортировка

Problem with UPDATE and UNIQUE

От
"Frank Millman"
Дата:
Hi all

I have a problem, which I suspect stems from bad design.

If I explain what I am doing, perhaps someone can suggest a better approach.

I want to store data in a 'tree' form, with a fixed number of levels, so
that each level has a defined role.

I have the following (simplified) table -

CREATE TABLE treedata (
  rowid serial primary key,
  levelno int not null,
  parentid int references treedata,
  seq int not null,
  code varchar not null,
  description varchar not null
  );

The 'root' item has a parentid of null, all other items must have a valid
parent. Items with a levelno of 0 represent raw data, higher levelno's
represent grouping levels. The seq indicator is used to display data in a
defined order.

To describe each of the levels in the tree, I have the following table -

CREATE TABLE treelevels (
  levelno int primary key,
  code varchar unique not null,
  description varchar not null
  );

Typical values for this table could be -
  (0,'Prod','Product code')
  (1,'Cat','Product category')
  (2,'*','All products')

Now for the problem. I want to insert or delete levels dynamically. I can
insert or delete levels in 'treedata' without a problem. However, I also
want to insert or delete a level in 'treelevels'.

Say I want to insert a level between 'code' and 'category' called 'group' -

INSERT INTO treelevels VALUES (1,'Group','Product group');

Obviously this will fail with a duplicate levelno. Therefore before the
insert statement I want to do this -

UPDATE treelevels SET levelno = (levelno+1) WHERE levelno >= 1;

The problem is that if there are a number of levels, and they are in
indeterminate order, I can get duplicate level numbers while the command is
being executed.

My workaround at present is the following -

UPDATE treelevels SET levelno = (levelno+10001) WHERE levelno >= 1;
UPDATE treelevels SET levelno = (levelno-10000) WHERE levelno >= 1;

It works, but it feels very ugly.

Any suggestions will be much appreciated.

Thanks

Frank Millman


Re: Problem with UPDATE and UNIQUE

От
Michael Glaesemann
Дата:
On Aug 22, 2007, at 1:02 , Frank Millman wrote:

> I want to store data in a 'tree' form, with a fixed number of
> levels, so
> that each level has a defined role.

First thought: fixed, predetermined levels, separate tables for each
level. If a more general approach is desired, your options are
generally adjacency list, nested sets, or contrib/ltree. Each has
their own strengths and weaknesses.

> I have the following (simplified) table -
>
> CREATE TABLE treedata (
>   rowid serial primary key,
>   levelno int not null,
>   parentid int references treedata,
>   seq int not null,
>   code varchar not null,
>   description varchar not null
>   );

rowid + parentid looks like adjacency list to me. Note that you're
storing redundant data (the levelno, which can be derived from the
rowid/parentid relationships), which you may want to do for
performance reasons, but does make things more complicated: you're
essentially caching data which brings with it problems of cache
invalidation. In this case, you need to make sure you're updating
levelno whenever it needs to be updated. (Which I'm sure you've
already thought of.)

> To describe each of the levels in the tree, I have the following
> table -
>
> CREATE TABLE treelevels (
>   levelno int primary key,
>   code varchar unique not null,
>   description varchar not null
>   );

Having each level as its own table would make this redundant, but
again, that might not fit with what you're modeling.

> Typical values for this table could be -
>   (0,'Prod','Product code')
>   (1,'Cat','Product category')
>   (2,'*','All products')

This makes me think you'll want to rethink your schema a bit, as
you're mixing different types of data: categories and products. I'd
at least separate this out into a products table and a categories
table. The categories table may in fact still require some kind of
tree structure, but I don't think products belongs as part of it.

CREATE TABLE products
(
     product_code text PRIMARY KEY
     , product_name text NOT NULL UNIQUE
     , product_description TEXT NOT NULL
);

CREATE TABLE categories
(
     category_id INTEGER PRIMARY KEY
     , category_name TEXT NOT NULL UNIQUE
     , category_description TEXT NOT NULL
     , parent_category_id INTEGER NOT NULL
         REFERENCES categories
     , level INTEGER NOT NULL
     , seq INTEGER NOT NULL
);

One difference here is that I have a NOT NULL constraint on
parent_category_id, and I'd define the root as a category_id that has
a parent_category = category_id. You should be able to add a check
constraint that makes sure only level = 0 has parent_category_id =
category_id and a unique index so only one category can have level 0.
Something like:

-- check constraint
CHECK (
     CASE WHEN category_id = parent_category_id
         THEN level = 0
     ELSE category_id <> parent_category_id
     END)
-- unique index
CREATE UNIQUE INDEX categories_root_uniq_idx
ON categories (level)
WHERE level = 0;

These are untested, but I think they should work.

Given that you'd like to determine the order of the items, I think
you might be interested in using nested sets rather than adjacency
list, as this information is automatically encoded for you. This
would look something like:

CREATE FUNCTION valid_nested_interval_bounds(INTEGER, INTEGER)
RETURNS BOOLEAN
IMMUTABLE
LANGUAGE SQL AS $body$
SELECT $1 < $2 -- make sure they're properly ordered
     AND ($2 - $1 - 1) % 2 = 0 -- useful if using the more strict
versions of nested set (where there no gaps)
     -- and you can easily calculate the total number of descendants
     -- from just the upper and lower bounds of the parent node
$body$;

I generally separate the validity test into a separate function as I
often have a couple different nested set hierarchies in the same
database, and it makes the table definition clearer. I believe the
function is inlined in the check constraint,  but I'm not sure. You
can, of course, move the SQL in the function straight into table
definition.

CREATE TABLE categories
(
     , category_name TEXT PRIMARY KEY
     , category_description TEXT NOT NULL
     , category_lower INTEGER NOT NULL UNIQUE -- often called "left"
     , category_upper INTEGER NOT NULL UNIQUE -- often called "right"
     , CHECK (valid_nested_interval_bounds(category_lower,
category_upper))
);

Another advantage of nested sets is that selecting all of the
descendants of a particular node is very easy. The downside of the
nested set model is that inserts and updates are a bit more
complicated and db intensive. If your categories aren't going to
change much, this shouldn't be a problem.

Again, you could add level information into the categories table, but
you'd need to make sure to update it appropriately.

CREATE TABLE category_products
(
     category_name TEXT NOT NULL REFERENCES categories
     , product_code TEXT NOT NULL REFERENCES products
     , UNIQUE (category_name, product_code)
);

My comments below refer to your original design.

> Now for the problem. I want to insert or delete levels dynamically.
> I can
> insert or delete levels in 'treedata' without a problem. However, I
> also
> want to insert or delete a level in 'treelevels'.
>
> Say I want to insert a level between 'code' and 'category' called
> 'group' -
>
> INSERT INTO treelevels VALUES (1,'Group','Product group');

It's a good habit to *always* explicitly list your columns: it's self-
documenting and more robust in the face of schema changes.

> Obviously this will fail with a duplicate levelno. Therefore before
> the
> insert statement I want to do this -
>
> UPDATE treelevels SET levelno = (levelno+1) WHERE levelno >= 1;
>
> The problem is that if there are a number of levels, and they are in
> indeterminate order, I can get duplicate level numbers while the
> command is
> being executed.
>
> My workaround at present is the following -
>
> UPDATE treelevels SET levelno = (levelno+10001) WHERE levelno >= 1;
> UPDATE treelevels SET levelno = (levelno-10000) WHERE levelno >= 1;

This is a general problem with nested sets and your situation where
you're caching the levelno, and you're work around is similar to the
two generally recommended solutions. One is to make updates using an
offset such as what you're doing, and the other is to utilize
negative levels. I'm keen on the latter, as I feel it's a bit more
flexible: you don't need to make sure your offset is large enough.
I'd do something like this:

BEGIN;

UPDATE treelevels
SET levelno = -1 * (levelno + 1)
WHERE levelno >= 1;

INSERT INTO treelevels (levelno, code, description) VALUES (1,
'Group', 'Product group');

UPDATE treelevels
SET levelno = -1 * levelno
WHERE levelno < 0;

COMMIT;

And you can wrap the common operations (insert, move, delete) in
functions for convenience.

Anyway, hope this gives you something to think about.

Michael Glaesemann
grzm seespotcode net



Re: Problem with UPDATE and UNIQUE

От
"Frank Millman"
Дата:
Michael Glaesemann wrote:
>
> On Aug 22, 2007, at 1:02 , Frank Millman wrote:
>
> > I want to store data in a 'tree' form, with a fixed number
> of levels,
> > so that each level has a defined role.
>

Thanks very much for the in-depth response, Michael. Plenty for the little
grey cells to work on.

> First thought: fixed, predetermined levels, separate tables
> for each level. If a more general approach is desired, your
> options are generally adjacency list, nested sets, or
> contrib/ltree. Each has their own strengths and weaknesses.
>

I am writing a general-purpose business/accounting application. If
successful, I hope to have a number of different companies using it. I want
to provide the ability for the end-user to to define their own,
multi-dimensional, views of various core tables (general ledger, products,
etc). I foresee that it will only be used for reporting purposes
(particularly WHERE, ORDER BY and GROUP BY). Therefore I do need a general
approach.

> > I have the following (simplified) table -
> >
> > CREATE TABLE treedata (
> >   rowid serial primary key,
> >   levelno int not null,
> >   parentid int references treedata,
> >   seq int not null,
> >   code varchar not null,
> >   description varchar not null
> >   );
>
> rowid + parentid looks like adjacency list to me. Note that
> you're storing redundant data (the levelno, which can be
> derived from the rowid/parentid relationships), which you may
> want to do for performance reasons, but does make things more
> complicated: you're essentially caching data which brings
> with it problems of cache invalidation. In this case, you
> need to make sure you're updating levelno whenever it needs
> to be updated. (Which I'm sure you've already thought of.)
>

I read up on 'adjency list' and 'nested sets', and I agree, the scheme I
have come up with is an adjency list. It had not occurred to me that levelno
is redundant, but I can see that this is so. I will have to check to see if
there are any implications if I remove it.

> > To describe each of the levels in the tree, I have the
> following table
> > -
> >
> > CREATE TABLE treelevels (
> >   levelno int primary key,
> >   code varchar unique not null,
> >   description varchar not null
> >   );
>
> Having each level as its own table would make this redundant,
> but again, that might not fit with what you're modeling.
>
> > Typical values for this table could be -
> >   (0,'Prod','Product code')
> >   (1,'Cat','Product category')
> >   (2,'*','All products')
>
> This makes me think you'll want to rethink your schema a bit,
> as you're mixing different types of data: categories and
> products. I'd at least separate this out into a products
> table and a categories table. The categories table may in
> fact still require some kind of tree structure, but I don't
> think products belongs as part of it.
>

Very good point. I will give this some serious thought.

[...]

> >
> > Say I want to insert a level between 'code' and 'category' called
> > 'group' -
> >
> > INSERT INTO treelevels VALUES (1,'Group','Product group');
>
> It's a good habit to *always* explicitly list your columns:
> it's self- documenting and more robust in the face of schema changes.
>
> > Obviously this will fail with a duplicate levelno. Therefore before
> > the insert statement I want to do this -
> >
> > UPDATE treelevels SET levelno = (levelno+1) WHERE levelno >= 1;
> >
> > The problem is that if there are a number of levels, and
> they are in
> > indeterminate order, I can get duplicate level numbers while the
> > command is being executed.
> >
> > My workaround at present is the following -
> >
> > UPDATE treelevels SET levelno = (levelno+10001) WHERE levelno >= 1;
> > UPDATE treelevels SET levelno = (levelno-10000) WHERE levelno >= 1;
>
> This is a general problem with nested sets and your situation
> where you're caching the levelno, and you're work around is
> similar to the two generally recommended solutions. One is to
> make updates using an offset such as what you're doing, and
> the other is to utilize negative levels. I'm keen on the
> latter, as I feel it's a bit more
> flexible: you don't need to make sure your offset is large enough.

I also like the idea of 'negating' the level. It is neat and effective.
Thanks for the tip, I will use it.

One trivial point. I use 'negating' quite a bit, and instead of -
    SET levelno = -1 * (levelno + 1)

I have adopted the habit of using -
    SET levelno = -(levelno + 1)

It just feels a bit neater.

[...]

>
> Anyway, hope this gives you something to think about.
>

It certainly does. Thanks again for all the valuable advice.

Frank


Re: Problem with UPDATE and UNIQUE

От
Michael Glaesemann
Дата:
On Aug 23, 2007, at 8:59 , Frank Millman wrote:

> It certainly does. Thanks again for all the valuable advice.

Glad you found it helpful. Good luck!

Michael Glaesemann
grzm seespotcode net