Обсуждение: 1 Sequence per Row i.e. each customer's first order starts at 1

Поиск
Список
Период
Сортировка

1 Sequence per Row i.e. each customer's first order starts at 1

От
Merrick
Дата:
I have been using postgresql for 8 years in web projects and ran into
a problem that I could not find a solution for in the archives or
through Google.

Here is a generalized example of what I want to happen. I have a
customers table, and an orders table. I would like for each customer
to have orders that start at 1 and move up sequentially. I realize
it's probably not efficient to create a new sequence for each
customer, so am looking for alternate ways to accomplish the same
thing. Below is an illustrated example of the outcome I would like. I
would also like similar functionality to a sequence so duplicate
order_id's are not generated. Please keep in mind that for what I am
developing, having each customer's orders start at 1 is more of a need
than a want.

Thank you.


Customers
customer_id     customer_name
----------------------------------------------------------------
1            acme corp
2            hacker corp
3            dixie corp


Orders
order_id    customer_id        timestamp
----------------------------------------------------------------
1            1            2009-07-01 14:42
1            2            2009-07-01 14:43
2            2            2009-07-01 14:44
1            3            2009-07-01 14:44
2            1            2009-07-01 14:44
3            1            2009-07-01 14:44

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Greg Stark
Дата:
On Thu, Jul 2, 2009 at 1:04 AM, Merrick<merrick@gmail.com> wrote:
> I would like for each customer
> to have orders that start at 1 and move up sequentially. I realize
> it's probably not efficient to create a new sequence for each
> customer, so am looking for alternate ways to accomplish the same
> thing.

You could have a last_order_num in the customer's main record and when you
issue a new order do something like

UPDATE customer
       SET last_order_num = last_order_num+1
 WHERE customer_id = :1
RETURNING last_order_num

Then use that value in a subsequent insert -- preferrably in the same
transaction so if it rolls back you restore the old last_order_num.

You should realize this will lock the customer record so if there are
other queries in the transaction which use the customer record you have to
be careful about deadlock risks. The other downside is it could create
a lot of update traffic on the customer table so it could require a lot of
careful vacuuming there.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Scott Marlowe
Дата:
On Wed, Jul 1, 2009 at 6:04 PM, Merrick<merrick@gmail.com> wrote:
> I have been using postgresql for 8 years in web projects and ran into
> a problem that I could not find a solution for in the archives or
> through Google.
>
> Here is a generalized example of what I want to happen. I have a
> customers table, and an orders table. I would like for each customer
> to have orders that start at 1 and move up sequentially. I realize
> it's probably not efficient to create a new sequence for each

Yeah, plus sequences aren't guaranteed to always give a gapless
sequence due to rollbacks etc.

> customer, so am looking for alternate ways to accomplish the same
> thing. Below is an illustrated example of the outcome I would like. I
> would also like similar functionality to a sequence so duplicate
> order_id's are not generated. Please keep in mind that for what I am
> developing, having each customer's orders start at 1 is more of a need
> than a want.

The simplest method is to do something like:

begin;
select * from sometable where cust_id=99 order by order_id desc for update;

to lock all the customer records for cust_id 99, then take the first
record, which should have the highest order_id, grab that increment it
and then insert the new record  and commit; the transaction.  Assuming
your customers aren't ordering dozens of things a second, this should
work with minimal locking contention.

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Merrick
Дата:
I was hoping there would be a way to add a field the sequence table
postgresql automatically generates so I could rely on whatever
mechanism postgresql uses to avoid the problems described thus far.

I should have included more info, it's highly likely that multiple
users will be accessing using same customer_id when creating orders
thus deadlocks would be an issue I would like to avoid.

Having the sequence be gapless would not be a requirement.

Thank you.

Merrick

On Wed, Jul 1, 2009 at 6:01 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> On Wed, Jul 1, 2009 at 6:04 PM, Merrick<merrick@gmail.com> wrote:
>> I have been using postgresql for 8 years in web projects and ran into
>> a problem that I could not find a solution for in the archives or
>> through Google.
>>
>> Here is a generalized example of what I want to happen. I have a
>> customers table, and an orders table. I would like for each customer
>> to have orders that start at 1 and move up sequentially. I realize
>> it's probably not efficient to create a new sequence for each
>
> Yeah, plus sequences aren't guaranteed to always give a gapless
> sequence due to rollbacks etc.
>
>> customer, so am looking for alternate ways to accomplish the same
>> thing. Below is an illustrated example of the outcome I would like. I
>> would also like similar functionality to a sequence so duplicate
>> order_id's are not generated. Please keep in mind that for what I am
>> developing, having each customer's orders start at 1 is more of a need
>> than a want.
>
> The simplest method is to do something like:
>
> begin;
> select * from sometable where cust_id=99 order by order_id desc for update;
>
> to lock all the customer records for cust_id 99, then take the first
> record, which should have the highest order_id, grab that increment it
> and then insert the new record  and commit; the transaction.  Assuming
> your customers aren't ordering dozens of things a second, this should
> work with minimal locking contention.
>

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Scott Marlowe
Дата:
I'm sure a trigger could be written to do what I just said.

On Wed, Jul 1, 2009 at 7:46 PM, Merrick<merrick@gmail.com> wrote:
> I was hoping there would be a way to add a field the sequence table
> postgresql automatically generates so I could rely on whatever
> mechanism postgresql uses to avoid the problems described thus far.
>
> I should have included more info, it's highly likely that multiple
> users will be accessing using same customer_id when creating orders
> thus deadlocks would be an issue I would like to avoid.
>
> Having the sequence be gapless would not be a requirement.
>
> Thank you.
>
> Merrick
>
> On Wed, Jul 1, 2009 at 6:01 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
>> On Wed, Jul 1, 2009 at 6:04 PM, Merrick<merrick@gmail.com> wrote:
>>> I have been using postgresql for 8 years in web projects and ran into
>>> a problem that I could not find a solution for in the archives or
>>> through Google.
>>>
>>> Here is a generalized example of what I want to happen. I have a
>>> customers table, and an orders table. I would like for each customer
>>> to have orders that start at 1 and move up sequentially. I realize
>>> it's probably not efficient to create a new sequence for each
>>
>> Yeah, plus sequences aren't guaranteed to always give a gapless
>> sequence due to rollbacks etc.
>>
>>> customer, so am looking for alternate ways to accomplish the same
>>> thing. Below is an illustrated example of the outcome I would like. I
>>> would also like similar functionality to a sequence so duplicate
>>> order_id's are not generated. Please keep in mind that for what I am
>>> developing, having each customer's orders start at 1 is more of a need
>>> than a want.
>>
>> The simplest method is to do something like:
>>
>> begin;
>> select * from sometable where cust_id=99 order by order_id desc for update;
>>
>> to lock all the customer records for cust_id 99, then take the first
>> record, which should have the highest order_id, grab that increment it
>> and then insert the new record  and commit; the transaction.  Assuming
>> your customers aren't ordering dozens of things a second, this should
>> work with minimal locking contention.
>>
>



--
When fascism comes to America, it will be intolerance sold as diversity.

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Scott Marlowe
Дата:
On Wed, Jul 1, 2009 at 7:46 PM, Merrick<merrick@gmail.com> wrote:
> I was hoping there would be a way to add a field the sequence table
> postgresql automatically generates so I could rely on whatever
> mechanism postgresql uses to avoid the problems described thus far.
>
> I should have included more info, it's highly likely that multiple
> users will be accessing using same customer_id when creating orders
> thus deadlocks would be an issue I would like to avoid.
>
> Having the sequence be gapless would not be a requirement.

Hmmm.  Well, are those users gonna be holding a lock while they do a
lot of hand processing work?  Do they need that id before they start a
lot of complex work?  In that case, you could do things two fold.
Lock the rows, create an empty but ready to go one, commit, then come
back later and update the row.

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Merrick
Дата:
Thank you, this helps get me on the right path.

On Wed, Jul 1, 2009 at 7:12 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> On Wed, Jul 1, 2009 at 7:46 PM, Merrick<merrick@gmail.com> wrote:
>> I was hoping there would be a way to add a field the sequence table
>> postgresql automatically generates so I could rely on whatever
>> mechanism postgresql uses to avoid the problems described thus far.
>>
>> I should have included more info, it's highly likely that multiple
>> users will be accessing using same customer_id when creating orders
>> thus deadlocks would be an issue I would like to avoid.
>>
>> Having the sequence be gapless would not be a requirement.
>
> Hmmm.  Well, are those users gonna be holding a lock while they do a
> lot of hand processing work?  Do they need that id before they start a
> lot of complex work?  In that case, you could do things two fold.
> Lock the rows, create an empty but ready to go one, commit, then come
> back later and update the row.
>

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Greg Stark
Дата:
On Thu, Jul 2, 2009 at 2:46 AM, Merrick<merrick@gmail.com> wrote:
> I was hoping there would be a way to add a field the sequence table
> postgresql automatically generates so I could rely on whatever
> mechanism postgresql uses to avoid the problems described thus far.

Hm, well you could create a sequence for every customer. I don't think
that's the way I would go but in theory it should work. Having
thousands or millions of sequences could make dealing with things like
pg_dump kind of a challenge though.

> I should have included more info, it's highly likely that multiple
> users will be accessing using same customer_id when creating orders
> thus deadlocks would be an issue I would like to avoid.

Well deadlocks are only a problem if you're not careful how you
acquire the locks. If you keep the transaction short and this is the
only reason you lock the customer record then you won't get a
deadlock. Just make sure you acquire the lock before you acquire any
weaker lock such as from referential integrity checks from inserting a
record which refers to that customer.

If it's part of a long transaction which acquires locks on multiple
customers then you have to be concerned with what order the locks are
acquired.


> On Wed, Jul 1, 2009 at 6:01 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
>> The simplest method is to do something like:
>>
>> begin;
>> select * from sometable where cust_id=99 order by order_id desc for update;
>>
>> to lock all the customer records for cust_id 99, then take the first
>> record, which should have the highest order_id, grab that increment it
>> and then insert the new record  and commit; the transaction.  Assuming
>> your customers aren't ordering dozens of things a second, this should
>> work with minimal locking contention.

I don't like the idea of locking all the order records. That sounds
like it would lead to even more contention than locking just the
customer record. There could be thousands of order records to lock all
over the order table.

Also, this seems like it would have deadlocks risks no matter how you
wrote the code. Since you can't control the order that the locking
would take place. If you happened to get a customer with a significant
number of orders you'll get a bitmap heap scan or even a sequential
scan and that will potentially lock the records in a different order
than your order by. If someone else previously locked them when the
stats reflected fewer records they would have taken the locks in a
different order.

It seems to me that locking all the order records here is really just
a proxy for locking their parent customer record and that would work
better anyways.

You could avoid the update to the customer record by combining these
two strategies though. Instead of updating a last_order_num field in
customer do something like this:

begin;
select * from customer where customer_id=:0 for update;
select max(order_num) from orders where customer_id = :0
insert into orders (order_num,...) values (:0 + 1, ...)
commit;

This uses the lock on customer to protect your insert against someone
else inserting the same order_num but doesn't actually update the
customer table.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Scott Marlowe
Дата:
On Thu, Jul 2, 2009 at 3:28 AM, Greg Stark<gsstark@mit.edu> wrote:
>> On Wed, Jul 1, 2009 at 6:01 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
>>> The simplest method is to do something like:
>>>
>>> begin;
>>> select * from sometable where cust_id=99 order by order_id desc for update;
>>>
>>> to lock all the customer records for cust_id 99, then take the first
>>> record, which should have the highest order_id, grab that increment it
>>> and then insert the new record  and commit; the transaction.  Assuming
>>> your customers aren't ordering dozens of things a second, this should
>>> work with minimal locking contention.
>
> I don't like the idea of locking all the order records. That sounds
> like it would lead to even more contention than locking just the
> customer record. There could be thousands of order records to lock all
> over the order table.

True.  Then again, if you're only locking it long enough to get the
next sequence, creating an empty record with that sequence, then
committing the transaction, it's a short lived lock.  Actually, since
you're only incrementing from the highest one, you could just lock the
id from a select max(orderid) where custid=xyz and you'd only have to
lock one row.

> It seems to me that locking all the order records here is really just
> a proxy for locking their parent customer record and that would work
> better anyways.
>
> You could avoid the update to the customer record by combining these
> two strategies though. Instead of updating a last_order_num field in
> customer do something like this:
>
> begin;
> select * from customer where customer_id=:0 for update;
> select max(order_num) from orders where customer_id = :0
> insert into orders (order_num,...) values (:0 + 1, ...)
> commit;
>
> This uses the lock on customer to protect your insert against someone
> else inserting the same order_num but doesn't actually update the
> customer table.

Good point.

Re: 1 Sequence per Row i.e. each customer's first order starts at 1

От
Greg Stark
Дата:
On Thu, Jul 2, 2009 at 10:35 AM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> Actually, since
> you're only incrementing from the highest one, you could just lock the
> id from a select max(orderid) where custid=xyz and you'd only have to
> lock one row.

Not really because you would have a race condition between selecting
the max() and then locking that record (you can't do FOR UPDATE
directly on the aggregate and in any case if you could it would have
the same problem).

Actually any scheme involving locking the orders would have the same
problems. Two transactions can start and try to lock some or all the
records. One will wait behind the other and only one transaction will
go ahead at a time but when the waiting transaction proceeds it still
won't see the newly inserted record and will get the same maximum. You
could get it to work as long as you're prepared to retry if you get
that race condition.

But then if you're prepared to retry you don't need locks at all. Just
"select max(id) from customer where customer_id = :0" and try to
insert -- if you get a unique violation start over and try again. As
long as you have an index on <customer_id,order_id> -- which would
presumably be your primary key anyways -- that should actually perform
just fine.



--
greg
http://mit.edu/~gsstark/resume.pdf