Обсуждение: Help with pre-loaded arbitrary key sequences
I am prototyping a system migration that is to employ Ruby, Rails and PostgreSQL. Rails has the convention that the primary key of a row is an arbitrary integer value assigned by the database manager through a sequence. As it turns out, the legacy application employs essentially the same convention in most instances. My question is this: Can one assign an id number to a sequenced key column on create and override the sequencer? If one does this then can and, if so, how does the sequencer in Postgresql handle the eventuality of running into a block of keys holding previously assigned numbers? For example. The existing client master dataset employs an eight digit account number as primary key. The values in use tend to cluster in groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225, 3001..3312, ..., 2001001..2001476, ..., etc. Assuming that these existing entries were all loaded into the new table with the values given as their primary keys and given that one could not simply start the sequencer at a value above the highest existing value: If one was to add a record and auto-generate a sequence number then can the sequencer handle looping from 1 to 375 and returning 376 and then continue until it reaches 1001, when it needs be loop again until 1288 and then return 1289 and so forth? During the load of the initial table data it would probably be necessary to disable the sequencer for this column. Is this in fact the case? If so, how is this done and how is the sequencer restored after the initial migration of data is complete? I presume that I can write my own sequencer function to accomplish this in any case but I wish to know if the existing method handles this case. Another solution is to simply decouple the existing key value from the new and treat the exiting client number as a piece of data (with or without an index), but that seems redundant since the new arbitrary key value might just as well be the existing arbitrary key value. Dispensing with the existing arbitrary number is another option but, the fact is that present business practice is for employees to refer to their clients and vendors by account number. The existing computer system is 25 years old but employs account numbers that predate automation. The firm is well past the century mark and some of these numbers have been in use with a few clients from the end of 1800's. So, while not strictly a business case, both the firm and some of its clients have a strong, if irrational, attachment to preserving the existing scheme. I regret if these questions appear naive but I am struggling with a lot of new information on a number of fronts and as usual wish to get quick answers to questions that may be far more involved than I realize. Sincerely, -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Fri, Jan 11, 2008 at 11:43:54AM -0500, James B. Byrne wrote: > My question is this: Can one assign an id number to a sequenced key column > on create and override the sequencer? If one does this then can and, if > so, how does the sequencer in Postgresql handle the eventuality of running > into a block of keys holding previously assigned numbers? You can set the counter during create, or at any time later. However, the counter is not defined by the column as such and will happily return numbers already in the table if you screw it up. The usual process is to insert normally when loading the data and then do a setval() on the sequence to past the values already stored. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Вложения
On Jan 11, 2008 10:43 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote: > I am prototyping a system migration that is to employ Ruby, Rails and > PostgreSQL. Rails has the convention that the primary key of a row is an > arbitrary integer value assigned by the database manager through a > sequence. As it turns out, the legacy application employs essentially the > same convention in most instances. > > My question is this: Can one assign an id number to a sequenced key column > on create and override the sequencer? If one does this then can and, if > so, how does the sequencer in Postgresql handle the eventuality of running > into a block of keys holding previously assigned numbers? > > For example. The existing client master dataset employs an eight digit > account number as primary key. The values in use tend to cluster in > groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225, > 3001..3312, ..., 2001001..2001476, ..., etc. > > Assuming that these existing entries were all loaded into the new table > with the values given as their primary keys and given that one could not > simply start the sequencer at a value above the highest existing value: If > one was to add a record and auto-generate a sequence number then can the > sequencer handle looping from 1 to 375 and returning 376 and then continue > until it reaches 1001, when it needs be loop again until 1288 and then > return 1289 and so forth? You're essentially wanting to fill in the blanks here. If you need good performance, then what you'll need to do is to preallocate all the numbers that haven't been assigned somewhere. So, we make a table something like: create table locatorcodes (i int, count_id serial); Then we insert an id into that table for everyone that's missing from the main table: insert into locatorcodes (i) select b.i from ( select * from generate_series(1,1000000)as i ) as b left join main_table a on (b.i=a.i) where a.i is null; Or something like that. Now, we've got a table with all the unused ids, and a serial count assigned to them. Create another sequence: create checkout_sequence; and use that to "check out" numbers from locatorcodes: select i from locatorcodes where count_id=nextval('checkout_sequence'); And since the sequence will just count up, there's little or no problems with performance. There's lots of ways of handling this. That's just one of the ones that doesn't slow your database down a lot. If you need to, you can shuffle the numbers going into the locatorcodes table with an order by random() when you create it.
On Wed, January 16, 2008 18:40, Scott Marlowe wrote: > You're essentially wanting to fill in the blanks here. If you need > good performance, then what you'll need to do is to preallocate all > the numbers that haven't been assigned somewhere. So, we make a table > something like: > > create table locatorcodes (i int, count_id serial); > > Then we insert an id into that table for everyone that's missing from > the main table: > > insert into locatorcodes (i) > select b.i from ( > select * from generate_series(1,1000000)as i > ) as b > left join main_table a on (b.i=a.i) > where a.i is null; > > Or something like that. > > Now, we've got a table with all the unused ids, and a serial count > assigned to them. Create another sequence: > > create checkout_sequence; > > and use that to "check out" numbers from locatorcodes: > > select i from locatorcodes where count_id=nextval('checkout_sequence'); > > And since the sequence will just count up, there's little or no > problems with performance. > > There's lots of ways of handling this. That's just one of the ones > that doesn't slow your database down a lot. > > If you need to, you can shuffle the numbers going into the > locatorcodes table with an order by random() when you create it. > Martin and Scott, Thank you both for your responses. If the entries involved numbered in the millions then Scott's approach has considerable merit. In my case, as the rate of additions is very low and the size of the existing blocks is in the hundreds rather than hundreds of thousands then I believe that I will simply write my own iterator and do a repetitive select when on the incrementally proposed values until an opening is found then insert the new entry and update the iterator next value accordingly. Regards, -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Jan 17, 2008 9:05 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote: > > If the entries involved numbered in the millions then Scott's approach has > considerable merit. In my case, as the rate of additions is very low and > the size of the existing blocks is in the hundreds rather than hundreds of > thousands then I believe that I will simply write my own iterator and do a > repetitive select when on the incrementally proposed values until an > opening is found then insert the new entry and update the iterator next > value accordingly. If race conditions are a possible issue, you use a sequence and increment that until you get a number that isn't used. That way two clients connecting at the same time can get different, available numbers.
On Thu, January 17, 2008 10:15, Scott Marlowe wrote: > > If race conditions are a possible issue, you use a sequence and > increment that until you get a number that isn't used. That way two > clients connecting at the same time can get different, available > numbers. > That is close to the idea that I originally had. I was simply wondering if the built-in sequencer could handle this case or whether I need to roll my own. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Jan 17, 2008 9:19 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote: > > On Thu, January 17, 2008 10:15, Scott Marlowe wrote: > > > > If race conditions are a possible issue, you use a sequence and > > increment that until you get a number that isn't used. That way two > > clients connecting at the same time can get different, available > > numbers. > > > > That is close to the idea that I originally had. I was simply wondering > if the built-in sequencer could handle this case or whether I need to roll > my own. Yeah, the built in sequencer just increments by one, nothing else. But it should be pretty easy to write a pl/pgsql function that grabs the next value and loop until it finds one that's available.
On Jan 17, 2008 9:19 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote: > > On Thu, January 17, 2008 10:15, Scott Marlowe wrote: > > > > If race conditions are a possible issue, you use a sequence and > > increment that until you get a number that isn't used. That way two > > clients connecting at the same time can get different, available > > numbers. > > > > That is close to the idea that I originally had. I was simply wondering > if the built-in sequencer could handle this case or whether I need to roll > my own. Got bored, hacked this aggregious pl/pgsql routine up. It looks horrible, but I wanted it to be able to use indexes. Seems to work. Test has ~750k rows and returns in it and returns a new id in < 1ms on my little server. File attached.
Вложения
On Thu, January 17, 2008 11:48, Scott Marlowe wrote: > Got bored, hacked this aggregious pl/pgsql routine up. It looks > horrible, but I wanted it to be able to use indexes. Seems to work. > Test has ~750k rows and returns in it and returns a new id in < 1ms > on my little server. > > File attached. > Many thanks. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3