Re: Race condition in resetting a sequence
От | Scott Marlowe |
---|---|
Тема | Re: Race condition in resetting a sequence |
Дата | |
Msg-id | dcc563d10708101400i406d4e79qdcfe5bd8668d0bcc@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Race condition in resetting a sequence (Lew <lew@lewscanon.nospam>) |
Список | pgsql-sql |
On 8/4/07, Lew <lew@lewscanon.nospam> wrote: > Steve Midgley writes: > >> The code I provided to reset a primary key sequence is actually part of > >> Ruby on Rails core library - actually they use something very similar > >> to what I originally sent: > ... > >> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT > >> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) > >> FROM #{table}), false) > > Tom Lane wrote: > > Ugh. That's completely unsafe/broken, unless they also use locking that > > you didn't show. > ... > > It doesn't have a race condition "all by itself": it will do what it's > > told. The problem with commands such as the above is that there's a > > time window between calculating the max() and executing the setval(), > > and that window is more than large enough to allow someone else to > > insert a row that invalidates your max() computation. (Because of MVCC > > snapshotting, the risk window is in fact as long as the entire > > calculation of the max --- it's not just a few instructions as some > > might naively think.) > > > > Now it is possible to make this brute-force approach safe: you can lock > > the table against all other modifications until you've applied your own > > changes. But you pay a high price in loss of concurrency if you do > > that. > > All this trouble over semantically-significant ID columns seems to support the > camp that excoriates use of artificial ID columns and autoincrementation > altogether. I don't that it does that at all. If properly implemented, sequences provide a race free way to assign unique ids to a table that otherwise would have no natural primary key. I think that having a religious preference one way or the other means you might use a suboptimal method in some cases. In the db I work on, we have artificial pks of sequences, artificial pks made from a large list of pre-created ids (6 alphanum char locator codes, common in the travel industry) and still other tables that have natural primary keys. Each has a reason for being the way it is. I can't see this saying natural pks, which are often open to interpretation and changes over time are any less are any better than artificial pks from a sequence at all. I can see it saying that the people who developed RoR framework were NOT database specialists. > The usual argument in their favor is that they speed up performance, but this > epicyclic dance to accomodate FK references to autoincremented keys makes the > case that there is also a performance penalty, and in the more critical > performance area of code development and correctness than in the less critical > search speed area. Sorry, but that's only one of the two arguments. The other one is that natural pks are often impossible due to the dataset not being able to be unique because of the business constraints. In an airline reservation system, you might start with last and first name for a pk. Then add phone number. except some people don't wanna give their phone numbers, so you use steet address, only to find out that John Smith with no phone lives on 123 Acorn St in Portland while another John Smith with no phone lives on 123 Acorn St in Phillidelphia. And what do you do when they call back about their ticket? Seriously, the natural primary key method has as many problems as the artificial one. In the USAF, they used to use a system to keep track of your medical records. The first letter of your last name, followed by the last four of your SSN. So, Ken Johnson might have an SSN of 123-45-6789 While Patricia Jackson might have an SSN of 453-89-6789. Imagine Ken's surprise when his doctor tells him he's pregnant. Or Patricia's surprise to have a limb amputated when she went in for a problem with heart murmers. By the time you add enough fields together to get a reliable primary key, you might have a multi-dimensional monster that kills performance. You can't just religiously say that one or the other is the only answer. Finally the cost of software development is only a small part of the overall cost of creating and deploying a piece of code. Last place I worked had a transactional database developed really quickly by non-db people that used isam tables (no, not mysql with myisam, just isam) that corrupted records dozens of times a day and required a team of support people to keep it running semi-smoothly and hours of downtime each night to be cleaned up. The initial savings on development costs were eaten up rather quickly by the ongoing support costs.