Обсуждение: Random-looking primary keys in the range 100000..999999

Поиск
Список
Период
Сортировка

Random-looking primary keys in the range 100000..999999

От
Kynn Jones
Дата:
I'm looking for a way to implement pseudorandom primary keys in the range 100000..999999.

The randomization scheme does not need to be cryptographically strong.  As long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the domain of the permutation has a cardinality that is not a power of 2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques that probably could do what I want to do, but their focus on cryptographic strength makes learning and implementing them tough going, plus, the performance will probably be poor, since high workloads are an asset for such crypto applications.  Since cryptographic strength is not something I need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn

Re: Random-looking primary keys in the range 100000..999999

От
hubert depesz lubaczewski
Дата:
How many rows do you plan on having in this table? Why this particular key range?

depesz


On Fri, Jul 4, 2014 at 3:24 PM, Kynn Jones <kynnjo@gmail.com> wrote:
I'm looking for a way to implement pseudorandom primary keys in the range 100000..999999.

The randomization scheme does not need to be cryptographically strong.  As long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the domain of the permutation has a cardinality that is not a power of 2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques that probably could do what I want to do, but their focus on cryptographic strength makes learning and implementing them tough going, plus, the performance will probably be poor, since high workloads are an asset for such crypto applications.  Since cryptographic strength is not something I need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn


Re: Random-looking primary keys in the range 100000..999999

От
Kynn Jones
Дата:
On Fri, Jul 4, 2014 at 10:13 AM, hubert depesz lubaczewski <depesz@gmail.com> wrote:
How many rows do you plan on having in this table?

Currently, only around 10K, but there's expectation that the number will grow.  It's hard to predict how much, hence the generous extra space.
 
Why this particular key range?

The requirements I've been given for the keys is that they be numeric, reasonably easy to type (hence, no 40-digit keys), never beginning with 0, and carrying no additional information content (or even suggesting it).  Among the pieces of information that the key should not include is the relative time of entry into the DB (hence, the keys should be more or less evenly distributed over the 100K-1M range).

k

Re: Random-looking primary keys in the range 100000..999999

От
Tom Lane
Дата:
Kynn Jones <kynnjo@gmail.com> writes:
> The requirements I've been given for the keys is that they be numeric,
> reasonably easy to type (hence, no 40-digit keys), never beginning with 0,
> and carrying no additional information content (or even suggesting it).

Why not just

(random()*899999)::int + 100000

This is unlikely to be cryptographically secure, but you didn't say you
needed that.  You will need to check for collisions, but that seems like
something you really ought to do anyway.

            regards, tom lane


Re: Random-looking primary keys in the range 100000..999999

От
Gavin Flower
Дата:
On 05/07/14 01:24, Kynn Jones wrote:
I'm looking for a way to implement pseudorandom primary keys in the range 100000..999999.

The randomization scheme does not need to be cryptographically strong.  As long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the domain of the permutation has a cardinality that is not a power of 2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques that probably could do what I want to do, but their focus on cryptographic strength makes learning and implementing them tough going, plus, the performance will probably be poor, since high workloads are an asset for such crypto applications.  Since cryptographic strength is not something I need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn

Hi Kynn,

How about  (note that 'payload' could be any set of valid columns):

-- using a crude Linear Congruential Generator
-- not very random, but does NOT create duplicates


DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
    id       int PRIMARY KEY default(100000 + (nextval('rseq') * 543537 + 997) % 900000),
    payload  int NOT NULL
);

INSERT INTO rtab (payload) VALUES (generate_series(1, 100000));

TABLE rtab;
Sample output:

   id   | payload
--------+---------
 644534 |       1
 288071 |       2
 831608 |       3
 475145 |       4
 118682 |       5
 662219 |       6
 305756 |       7
 849293 |       8
 492830 |       9
 136367 |      10
 679904 |      11
 323441 |      12
 866978 |      13
 510515 |      14
 154052 |      15
 697589 |      16
 341126 |      17
 884663 |      18
 528200 |      19
 171737 |      20


Cheers,
Gavin

Re: Random-looking primary keys in the range 100000..999999

От
Gavin Flower
Дата:
On 05/07/14 15:48, Gavin Flower wrote:
On 05/07/14 01:24, Kynn Jones wrote:
I'm looking for a way to implement pseudorandom primary keys in the range 100000..999999.

The randomization scheme does not need to be cryptographically strong.  As long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the domain of the permutation has a cardinality that is not a power of 2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques that probably could do what I want to do, but their focus on cryptographic strength makes learning and implementing them tough going, plus, the performance will probably be poor, since high workloads are an asset for such crypto applications.  Since cryptographic strength is not something I need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn

Hi Kynn,

How about  (note that 'payload' could be any set of valid columns):

-- using a crude Linear Congruential Generator
-- not very random, but does NOT create duplicates


DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
    id       int PRIMARY KEY default(100000 + (nextval('rseq') * 543537 + 997) % 900000),
    payload  int NOT NULL
);

INSERT INTO rtab (payload) VALUES (generate_series(1, 100000));

TABLE rtab;
Sample output:

   id   | payload
--------+---------
 644534 |       1
 288071 |       2
 831608 |       3
 475145 |       4
 118682 |       5
 662219 |       6
 305756 |       7
 849293 |       8
 492830 |       9
 136367 |      10
 679904 |      11
 323441 |      12
 866978 |      13
 510515 |      14
 154052 |      15
 697589 |      16
 341126 |      17
 884663 |      18
 528200 |      19
 171737 |      20


Cheers,
Gavin
Hmm...

for a 10 times larger range
    id       int PRIMARY KEY default(1000000 + (nextval('rseq') * 543537 + 997) % 9000000),
also works!

Re: Random-looking primary keys in the range 100000..999999

От
Martijn van Oosterhout
Дата:
On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> I'm looking for a way to implement pseudorandom primary keys in the range
> 100000..999999.
>
> The randomization scheme does not need to be cryptographically strong.  As
> long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981)  ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Вложения

Re: Random-looking primary keys in the range 100000..999999

От
Kynn Jones
Дата:
Thanks to Gavin and Martijn for their suggestions.  They're both simple good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because I understand better how it avoids collisions, so it was easier for me to tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky prime 900001.  This happens to work *perfectly*, because the range of such a generator is p-1, not p.  (BTW, Martijn's choice of the "random" 2345 for the multiplier was a somewhat lucky one, since such generators are not full for arbitrary multipliers; for example, the one with modulus 899981 is not full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's pow function, and implemented a 3-argument pow for PL/PgSQL.  I include all the code below, including a snippet borrowed from Gavin's post, and modified here and there.  (I'm not very experienced with PL/PgSQL, so please feel free to point out ways in which my PL/PgSQL code can be improved.)

First the functions:

    CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint) returns bigint AS $$
    DECLARE
    x  bigint;
    xx bigint;
    BEGIN
      IF n = 0 THEN RETURN 1; END IF;

      x := bigx % m;
      xx := (x * x) % m;

      IF n % 2 = 0 THEN
        RETURN pow_mod(xx, n/2, m);
      ELSE
        RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
      END IF;

    END;
    $$ LANGUAGE plpgsql strict immutable;


    -- "mcg" = "multiplicative congruential generator"
    CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
    BEGIN
      -- CHECK (0 < i AND i < 900001)                                                                                                                                                                             
      RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
    END;
    $$ LANGUAGE plpgsql strict immutable;


And here's a small demo:

    DROP TABLE IF EXISTS rtab;
    DROP SEQUENCE IF EXISTS rseq;

    CREATE SEQUENCE rseq;

    CREATE TABLE rtab
    (
        id       int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
        payload  int NOT NULL
    );

    \timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1, 900000)); \timing off
    -- Timing is on.
    -- INSERT 0 900000
    -- Time: 201450.781 ms
    -- Timing is off.

    SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
    --    id   | payload
    -- --------+---------
    --  539815 |  449991
    --  901731 |  449992
    --  878336 |  449993
    --  564275 |  449994
    --  863664 |  449995
    --  720159 |  449996
    --  987833 |  449997
    --  999471 |  449998
    --  999977 |  449999
    --  999999 |  450000
    --  921739 |  450001
    --  722684 |  450002
    --  596638 |  450003
    --  121592 |  450004
    --  687895 |  450005
    --  477734 |  450006
    --  585988 |  450007
    --  942869 |  450008
    --  175776 |  450009
    --  377207 |  450010
    -- (20 rows)

kj



On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> I'm looking for a way to implement pseudorandom primary keys in the range
> 100000..999999.
>
> The randomization scheme does not need to be cryptographically strong.  As
> long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981)  ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
dwQY8zHxuOQ=
=IJFu
-----END PGP SIGNATURE-----


Re: Random-looking primary keys in the range 100000..999999

От
Gavin Flower
Дата:
Please don't top post!

See below for my comments.

On 09/07/14 07:04, Kynn Jones wrote:
Thanks to Gavin and Martijn for their suggestions.  They're both simple good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because I understand better how it avoids collisions, so it was easier for me to tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky prime 900001.  This happens to work *perfectly*, because the range of such a generator is p-1, not p.  (BTW, Martijn's choice of the "random" 2345 for the multiplier was a somewhat lucky one, since such generators are not full for arbitrary multipliers; for example, the one with modulus 899981 is not full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's pow function, and implemented a 3-argument pow for PL/PgSQL.  I include all the code below, including a snippet borrowed from Gavin's post, and modified here and there.  (I'm not very experienced with PL/PgSQL, so please feel free to point out ways in which my PL/PgSQL code can be improved.)

First the functions:

    CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint) returns bigint AS $$
    DECLARE
    x  bigint;
    xx bigint;
    BEGIN
      IF n = 0 THEN RETURN 1; END IF;

      x := bigx % m;
      xx := (x * x) % m;

      IF n % 2 = 0 THEN
        RETURN pow_mod(xx, n/2, m);
      ELSE
        RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
      END IF;

    END;
    $$ LANGUAGE plpgsql strict immutable;


    -- "mcg" = "multiplicative congruential generator"
    CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
    BEGIN
      -- CHECK (0 < i AND i < 900001)                                                                                                                                                                             
      RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
    END;
    $$ LANGUAGE plpgsql strict immutable;


And here's a small demo:

    DROP TABLE IF EXISTS rtab;
    DROP SEQUENCE IF EXISTS rseq;

    CREATE SEQUENCE rseq;

    CREATE TABLE rtab
    (
        id       int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
        payload  int NOT NULL
    );

    \timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1, 900000)); \timing off
    -- Timing is on.
    -- INSERT 0 900000
    -- Time: 201450.781 ms
    -- Timing is off.

    SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
    --    id   | payload
    -- --------+---------
    --  539815 |  449991
    --  901731 |  449992
    --  878336 |  449993
    --  564275 |  449994
    --  863664 |  449995
    --  720159 |  449996
    --  987833 |  449997
    --  999471 |  449998
    --  999977 |  449999
    --  999999 |  450000
    --  921739 |  450001
    --  722684 |  450002
    --  596638 |  450003
    --  121592 |  450004
    --  687895 |  450005
    --  477734 |  450006
    --  585988 |  450007
    --  942869 |  450008
    --  175776 |  450009
    --  377207 |  450010
    -- (20 rows)

kj



On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> I'm looking for a way to implement pseudorandom primary keys in the range
> 100000..999999.
>
> The randomization scheme does not need to be cryptographically strong.  As
> long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981)  ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
dwQY8zHxuOQ=
=IJFu
-----END PGP SIGNATURE-----


Didn't I mention that my code & variable names are copyrighted and my methods patented?  :-)

More seriously, its great that you picked out what you want from multiple replies!  Understanding what is given to you is far better than blindly cutting & pasting.

Actually I just realized that having an additive constant is essentially meaningless in this Use Case, as we are feeding in a sequence of consecutive integers.


Cheers,
Gavin