Обсуждение: finding reusable ids

Поиск
Список
Период
Сортировка

finding reusable ids

От
Kenji Morishige
Дата:
I have a table that creates "check-out" records that stores information when
a particular resource is being utilized.  I want to maintain a friendly
shortened ID so people can reference these check outs.

At any given time, there should not be more than 999999 or so check-outs, so
as the check-outs get checked in, the old IDs would become available.  What
is the best method to query for these resusable IDs that would not be
assigned to checked out items?  It seems that it would be quite inefficient
to look at the entire table to see which ids exist, then increment
accordingly.  For some reason, I feel that there would be something already
available to solve this.

example set:

uid  co-id  checked-in?
1    1      n
2    2      n
3    3      y
4    4      n
5    3      n

obviously, this is a small sample set, but the id 3 can be reused, so I'd
like to reuse it without using a external tracking mechansm.  My table has
1,000,000+ records.

Kenji

Re: finding reusable ids

От
Nis Jørgensen
Дата:
Kenji Morishige skrev:
> I have a table that creates "check-out" records that stores information when
> a particular resource is being utilized.  I want to maintain a friendly
> shortened ID so people can reference these check outs.
>
> At any given time, there should not be more than 999999 or so check-outs, so
> as the check-outs get checked in, the old IDs would become available.  What
> is the best method to query for these resusable IDs that would not be
> assigned to checked out items?  It seems that it would be quite inefficient
> to look at the entire table to see which ids exist, then increment
> accordingly.  For some reason, I feel that there would be something already
> available to solve this.
>
> example set:
>
> uid  co-id  checked-in?
> 1    1      n
> 2    2      n
> 3    3      y
> 4    4      n
> 5    3      n
>
> obviously, this is a small sample set, but the id 3 can be reused, so I'd
> like to reuse it without using a external tracking mechansm.  My table has
> 1,000,000+ records.

Do you need the co-id once the item is checked in? If not, I would split
this into two tables:

resources
uid <more data>
1
2
3
4
5

checked_out
uid co_id
1   1
2   2
4   4
5   3

Where the existence of the row in the second table doubles as the
checked-in flag.

This doesn't solve your problem, but might simplify the query to find a
new id - something like this (untested):

SELECT min(q.co_id) +1
FROM (
SELECT (co_id + 1) as co_id FROM checked_out
EXCEPT
SELECT co_id FROM checked_out
) q;

(you need a special case when the table is empty)

The same method can of course be used with your original table layout.

Nis

Re: finding reusable ids

От
Kenji Morishige
Дата:
Actually, I already have a resource table that stores the uid of the item in
question.  The checkout table does double duty as a history mechanism and a
check-out mechanism.  I think you are on the right track, I should seperate
these two tasks and possibly create another table.  The actual design is a
bit more complicated as we actually don't have a a checked-in flag, but a
start and finish time where users can actually store multiple overlapping
records.

Kenji

On Tue, Aug 07, 2007 at 12:23:00PM +0200, Nis Jørgensen wrote:
> Kenji Morishige skrev:
> > I have a table that creates "check-out" records that stores information when
> > a particular resource is being utilized.  I want to maintain a friendly
> > shortened ID so people can reference these check outs.
> >
> > At any given time, there should not be more than 999999 or so check-outs, so
> > as the check-outs get checked in, the old IDs would become available.  What
> > is the best method to query for these resusable IDs that would not be
> > assigned to checked out items?  It seems that it would be quite inefficient
> > to look at the entire table to see which ids exist, then increment
> > accordingly.  For some reason, I feel that there would be something already
> > available to solve this.
> >
> > example set:
> >
> > uid  co-id  checked-in?
> > 1    1      n
> > 2    2      n
> > 3    3      y
> > 4    4      n
> > 5    3      n
> >
> > obviously, this is a small sample set, but the id 3 can be reused, so I'd
> > like to reuse it without using a external tracking mechansm.  My table has
> > 1,000,000+ records.
>
> Do you need the co-id once the item is checked in? If not, I would split
> this into two tables:
>
> resources
> uid <more data>
> 1
> 2
> 3
> 4
> 5
>
> checked_out
> uid co_id
> 1   1
> 2   2
> 4   4
> 5   3
>
> Where the existence of the row in the second table doubles as the
> checked-in flag.
>
> This doesn't solve your problem, but might simplify the query to find a
> new id - something like this (untested):
>
> SELECT min(q.co_id) +1
> FROM (
> SELECT (co_id + 1) as co_id FROM checked_out
> EXCEPT
> SELECT co_id FROM checked_out
> ) q;
>
> (you need a special case when the table is empty)
>
> The same method can of course be used with your original table layout.
>
> Nis
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: finding reusable ids

От
Michael Glaesemann
Дата:
[Please don't top post as it makes the discussion more difficult to
follow.]

On Aug 7, 2007, at 9:05 , Kenji Morishige wrote:


> On Tue, Aug 07, 2007 at 12:23:00PM +0200, Nis Jørgensen wrote:
>>
>> This doesn't solve your problem, but might simplify the query to
>> find a
>> new id - something like this (untested):
>>
>> SELECT min(q.co_id) +1
>> FROM (
>> SELECT (co_id + 1) as co_id FROM checked_out
>> EXCEPT
>> SELECT co_id FROM checked_out
>> ) q;

I don't believe this is concurrency safe, even if wrapped in a
transaction. Two concurrent transactions could end up calculating the
same next co_id. This could be caught by having appropriate
constraints on checked_out and retrying on error.

> Actually, I already have a resource table that stores the uid of
> the item in
> question.  The checkout table does double duty as a history
> mechanism and a
> check-out mechanism.  I think you are on the right track, I should
> seperate
> these two tasks and possibly create another table.  The actual
> design is a
> bit more complicated as we actually don't have a a checked-in flag,
> but a
> start and finish time where users can actually store multiple
> overlapping
> records.

I agree that you should probably tweak the schema a bit. Also, as you
want to reuse your checkout ids, you're actually considering them a
separate resoure, so you might consider putting them in a separate
table. Here's what I came up with:

As an aside, I wouldn't call them checkout_ids (even though I
did :)), as id often connotes a unique identifier something (like
your uids), and you're reusing them. I might call them
checkout_reference or checkout_number or something.

CREATE TABLE checkout_ids
(
     checkout_id INTEGER PRIMARY KEY
     , is_checked_out BOOLEAN NOT NULL
         DEFAULT FALSE
     , UNIQUE (checkout_id, is_checked_out)
);
-- populate the table with the values you'll use
INSERT INTO checkout_ids (checkout_id)
SELECT generate_series(1,999999);

CREATE TABLE checkouts
(
     checkout_id INTEGER PRIMARY KEY
     , is_checked_out BOOLEAN NOT NULL
         CHECK (is_checked_out)
         DEFAULT TRUE
     , FOREIGN KEY (checkout_id, is_checked_out)
         REFERENCES checkout_ids (checkout_id, is_checked_out)
     , uid INTEGER NOT NULL -- with some fk
);

-- Of course, you can add the checkout start and end dates/timestamps
to this table:
-- they're independent of managing the checkout_id resource
-- I've added is_checked_out to this table to ensure that all
checkouts (checkout_id) have
-- is_checked_out set to true (via the CHECK constraint). This could
also be done with a
-- trigger.

-- And a couple quick functions to handle the process of checking in
and checking out.
-- The SELECT ... FOR UPDATE in checkout should ensure that
concurrent transactions
-- aren't grabbing the same checkout_id.

CREATE FUNCTION checkout (p_uid INTEGER)
RETURNS INTEGER -- checkout_id
LANGUAGE plpgsql AS $body$
DECLARE
     v_checkout_id INTEGER;
BEGIN
     SELECT INTO v_checkout_id
         checkout_id
     FROM checkout_ids
     WHERE NOT is_checked_out
     LIMIT 1
     FOR UPDATE;

     UPDATE checkout_ids
     SET is_checked_out = TRUE
     WHERE checkout_id = v_checkout_id;

     INSERT INTO checkouts (checkout_id, uid)
     VALUES (v_checkout_id, p_uid);
     RETURN v_checkout_id;
END;
$body$;

CREATE FUNCTION checkin (p_checkout_id INTEGER)
RETURNS VOID
LANGUAGE plpgsql AS $body$
BEGIN

     DELETE FROM checkouts
     WHERE checkout_id = p_checkout_id;

     UPDATE checkout_ids
     SET is_checked_out = FALSE
     WHERE checkout_id = p_checkout_id;

     RETURN;
END;
$body$;

Hope this helps.

Michael Glaesemann
grzm seespotcode net