Обсуждение: murmur3 hash binary data migration from Oracle to PostgreSQL
Hi Team,
We are doing Migration from Oracle to PostgreSQL. In SOurce database we have Binary data stored using murmur3 hashing function. In Oracle this data is being generated from the Java code and inserted into the Oracle database.
As part of Migration processes the reference data on which this murmur3 is generated is also getting changed while migrating to PostgreSQL.
In PostgreSQL do we have any mechanism for fetching this murmur3 hash function for any UUID.
Please let me know, what ever solution is available for the implementation.
Best Regards,
Jagmohan
> On 25/04/2023 03:21 CEST Jagmohan Kaintura <jagmohan@tecorelabs.com> wrote: > > We are doing Migration from Oracle to PostgreSQL. In SOurce database we have > Binary data stored using murmur3 hashing function. In Oracle this data is > being generated from the Java code and inserted into the Oracle database. Do you store the hash and the binary data? The hash is a key to the binary data? > As part of Migration processes the reference data on which this murmur3 is > generated is also getting changed while migrating to PostgreSQL. Why is the data changing during migration? Shouldn't a migration preserve the data and only adapt it if the database model needs to change? > In PostgreSQL do we have any mechanism for fetching this murmur3 hash > function for any UUID. I don't understand what you mean by that. What does it have to do with UUID? Do you want to generate the MurmurHash in Postgres? Postgres has no builtin support for that hash function and I can't find any extension in a quick online search. Or do you want to just look up rows by the MurmurHash? That's a trivial SELECT statement. Store the hash in an indexed column of type bytea to have performant lookups. -- Erik
On Mon, Apr 24, 2023 at 9:22 PM Jagmohan Kaintura <jagmohan@tecorelabs.com> wrote: > > We are doing Migration from Oracle to PostgreSQL. In SOurce database we have Binary data stored using murmur3 hashing function.In Oracle this data is being generated from the Java code and inserted into the Oracle database. > > As part of Migration processes the reference data on which this murmur3 is generated is also getting changed while migratingto PostgreSQL. > > In PostgreSQL do we have any mechanism for fetching this murmur3 hash function for any UUID. > > Please let me know, what ever solution is available for the implementation. My apologies if I misparsed a couple of statements. I am having trouble determining if you are migrating away from Murmur3. If you are selecting a new digest, then SipHash would be an excellent choice. It was designed to avoid collisions and be fast. Plus it was designed by Jean-Philippe Aumasson and Daniel J. Bernstein. It doesn't get much better than those two fellows. Jeff
Hi , No we want to generate murmur3 format only. BR, Jagmohan > On 25-Apr-2023, at 8:52 AM, Jeffrey Walton <noloader@gmail.com> wrote: > > On Mon, Apr 24, 2023 at 9:22 PM Jagmohan Kaintura > <jagmohan@tecorelabs.com> wrote: >> >> We are doing Migration from Oracle to PostgreSQL. In SOurce database we have Binary data stored using murmur3 hashingfunction. In Oracle this data is being generated from the Java code and inserted into the Oracle database. >> >> As part of Migration processes the reference data on which this murmur3 is generated is also getting changed while migratingto PostgreSQL. >> >> In PostgreSQL do we have any mechanism for fetching this murmur3 hash function for any UUID. >> >> Please let me know, what ever solution is available for the implementation. > > My apologies if I misparsed a couple of statements. I am having > trouble determining if you are migrating away from Murmur3. > > If you are selecting a new digest, then SipHash would be an excellent > choice. It was designed to avoid collisions and be fast. Plus it was > designed by Jean-Philippe Aumasson and Daniel J. Bernstein. It doesn't > get much better than those two fellows. > > Jeff
> On 25/04/2023 12:44 CEST Jagmohan Kaintura <jagmohan@tecorelabs.com> wrote: > > No we want to generate murmur3 format only. > > > On 25-Apr-2023, at 8:52 AM, Jeffrey Walton <noloader@gmail.com> wrote: > > > > My apologies if I misparsed a couple of statements. I am having > > trouble determining if you are migrating away from Murmur3. > > > > If you are selecting a new digest, then SipHash would be an excellent > > choice. It was designed to avoid collisions and be fast. Plus it was > > designed by Jean-Philippe Aumasson and Daniel J. Bernstein. It doesn't > > get much better than those two fellows. I don't understand. In your original message you write: > On 25/04/2023 03:21 CEST Jagmohan Kaintura <jagmohan@tecorelabs.com> wrote: > > In Oracle this data is being generated from the Java code and inserted into > the Oracle database. Do you generate the hash in Java or in Oracle? I don't know if Oracle Database provides a MurmurHash function but in Java you could use apache-commons[0] and change your Java code to also insert the hash (if you don't already do so). [0] https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/MurmurHash3.html PS: Please don't top post. https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics -- Erik
No we want to generate murmur3 format only.
If you need a server-side murmur3 function
- one alternative is the https://github.com/markokr/pghashlib extension.
psql (15.2 (Debian 15.2-2))
Type "help" for help.
db=# create extension hashlib;
CREATE EXTENSION
db=#
Type "help" for help.
db=# create extension hashlib;
CREATE EXTENSION
db=#
select hash_string('', 'murmur3');
select hash_string('a', 'murmur3');
select hash_string('abcdefg', 'murmur3');
select encode(hash128_string('abcdefg', 'murmur3'), 'hex');
+-------------+
| hash_string |
+-------------+
| 0 |
+-------------+
(1 row)
+-------------+
| hash_string |
+-------------+
| 1009084850 |
+-------------+
(1 row)
+-------------+
| hash_string |
+-------------+
| -2009294074 |
+-------------+
(1 row)
+----------------------------------+
| encode |
+----------------------------------+
| 069b3c88000000000000000000000000 |
+----------------------------------+
(1 row)
select hash_string('a', 'murmur3');
select hash_string('abcdefg', 'murmur3');
select encode(hash128_string('abcdefg', 'murmur3'), 'hex');
+-------------+
| hash_string |
+-------------+
| 0 |
+-------------+
(1 row)
+-------------+
| hash_string |
+-------------+
| 1009084850 |
+-------------+
(1 row)
+-------------+
| hash_string |
+-------------+
| -2009294074 |
+-------------+
(1 row)
+----------------------------------+
| encode |
+----------------------------------+
| 069b3c88000000000000000000000000 |
+----------------------------------+
(1 row)
In my test, I have used this fork: https://github.com/bgdevlab/pghashlib/tree/bgdevlab/builds
RUN mkdir -p /pghashlib \
&& git clone --depth 1 --branch bgdevlab/builds https://github.com/bgdevlab/pghashlib.git /pghashlib \
&& cd /pghashlib \
&& make USE_PGXS=1 \
&& make USE_PGXS=1 install \
&& rm -rf /pghashlib
&& git clone --depth 1 --branch bgdevlab/builds https://github.com/bgdevlab/pghashlib.git /pghashlib \
&& cd /pghashlib \
&& make USE_PGXS=1 \
&& make USE_PGXS=1 install \
&& rm -rf /pghashlib
Regards,
Imre