Обсуждение: bigint out of range
Hello team ,
We are getting ERROR: bigint out of range. Please help on this.
ERROR: bigint out of range
kbdb=# INSERT INTO kb_dar_ran_url_check (url_hash,stat_date,topic_id,site_id,url,status,user_comments,review_comments) VALUES (72894677781133866997924561390146294513,E'19-04-2019',32793,1035,E'https://bikez.com/search/index.php',1,NULL,NULL);
ERROR: bigint out of range
Table structure is :
Table "kb_test.kb_dar_ran_url_check"
Column | Type | Collation | Nullable | Default
-----------------+-------------------------+-----------+----------+---------
status | bigint | | |
url_hash | bigint | | not null |
url | character varying(4000) | | not null |
review_comments | character varying(4000) | | |
user_comments | character varying(4000) | | |
stat_date | character varying(128) | | not null |
topic_id | numeric(38,0) | | not null |
site_id | numeric(38,0) | | not null |
Partition key: LIST (stat_date)
Thanks,
Daulat
Hello team ,
We are getting ERROR: bigint out of range. Please help on this.
url_hash | bigint | | not null |
On 2019-05-16 08:48:51 -0700, David G. Johnston wrote: > On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> wrote: > > > url_hash | bigint | | not null | > > > Change the type of url_hash; make it text instead of bigint. Or numeric(38, 0). I think it isn't coincidence that he tries to store a 38-digit number in it. > As a bonus: > > Use text instead of arbitrary varchar(4000) fields and add, e.g., check (length > (url) < 4000) or something better. > > Needing 38 digit integers for "id" fields seems odd. I guess the schema was copied from Oracle. In Oracle, all numbers are really 38 digit decimal floating point numbers and the limit for varchar2 is 4000. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Вложения
On 2019-05-16 08:48:51 -0700, David G. Johnston wrote:
> On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> wrote:
>
>
> url_hash | bigint | | not null |
>
>
> Change the type of url_hash; make it text instead of bigint.
Or numeric(38, 0). I think it isn't coincidence that he tries to store
a 38-digit number in it.
On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: > On Saturday, May 18, 2019, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2019-05-16 08:48:51 -0700, David G. Johnston wrote: > > On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> > wrote: > > > > > > url_hash | bigint | | not null | > > > > > > Change the type of url_hash; make it text instead of bigint. > > Or numeric(38, 0). I think it isn't coincidence that he tries to store > a 38-digit number in it. > > > You don’t perform math on a hash That's not generally true. Hashes are used for further computation for example in hash tables or in cryptography. > thus its not a number This is just silly. All hash functions I have ever encountered compute a single fixed size integer from a stream of integers. The result may be larger than a machine word, in which case the representation in C (or a similar low level language) may be an array of words (or bytes), but it's still an integer. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Вложения
On 5/18/19 2:27 PM, Peter J. Holzer wrote: > On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: >> On Saturday, May 18, 2019, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> >> On 2019-05-16 08:48:51 -0700, David G. Johnston wrote: >> > On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> >> wrote: >> > >> > >> > url_hash | bigint | | not null | >> > >> > >> > Change the type of url_hash; make it text instead of bigint. >> >> Or numeric(38, 0). I think it isn't coincidence that he tries to store >> a 38-digit number in it. >> >> >> You don’t perform math on a hash > That's not generally true. Hashes are used for further computation for > example in hash tables or in cryptography. How is it "using math" to use a hash key in a hash lookup table? -- Angular momentum makes the world go 'round.
On 2019-05-18 15:19:22 -0500, Ron wrote: > On 5/18/19 2:27 PM, Peter J. Holzer wrote: > > On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: > > > You don’t perform math on a hash > > That's not generally true. Hashes are used for further computation for > > example in hash tables or in cryptography. > > How is it "using math" to use a hash key in a hash lookup table? hash modulo table size. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Вложения
On 2019-05-18 15:19:22 -0500, Ron wrote:On 5/18/19 2:27 PM, Peter J. Holzer wrote:On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:You don’t perform math on a hashThat's not generally true. Hashes are used for further computation for example in hash tables or in cryptography.How is it "using math" to use a hash key in a hash lookup table?hash modulo table size.
I've seen that used when the tablespace is pre-allocated, and you hash modulo the tablespace page number. (Yes, performance tanks when you start filling up pages.) How do you hash on the (ever growing) table size?
Angular momentum makes the world go 'round.
On 2019-05-18 17:14:59 -0500, Ron wrote: > On 5/18/19 3:49 PM, Peter J. Holzer wrote: > > On 2019-05-18 15:19:22 -0500, Ron wrote: > > On 5/18/19 2:27 PM, Peter J. Holzer wrote: > > On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: > > You don’t perform math on a hash > > That's not generally true. Hashes are used for further computation for > example in hash tables or in cryptography. > > How is it "using math" to use a hash key in a hash lookup table? > > hash modulo table size. > > > I've seen that used when the tablespace is pre-allocated, and you hash modulo > the tablespace page number. (Yes, performance tanks when you start filling up > pages.) How do you hash on the (ever growing) table size? The hash function returns a number in a range much larger than the possible number of buckets. 64 bits is a good choice today. To determine the bucket you need to reduce this number to something in the range [0, nr_buckets). This is where modulo comes in: i = h % nr_buckets If the the table fills up, you increase nr_buckets, reallocate and rehash all entries. (If nr_buckets is a power of two, the modulo operation can be efficiently implemented by using bitwise and) hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Вложения
On 5/18/19 5:39 PM, Peter J. Holzer wrote: > On 2019-05-18 17:14:59 -0500, Ron wrote: >> On 5/18/19 3:49 PM, Peter J. Holzer wrote: >> >> On 2019-05-18 15:19:22 -0500, Ron wrote: >> >> On 5/18/19 2:27 PM, Peter J. Holzer wrote: >> >> On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: >> >> You don’t perform math on a hash >> >> That's not generally true. Hashes are used for further computation for >> example in hash tables or in cryptography. >> >> How is it "using math" to use a hash key in a hash lookup table? >> >> hash modulo table size. >> >> >> I've seen that used when the tablespace is pre-allocated, and you hash modulo >> the tablespace page number. (Yes, performance tanks when you start filling up >> pages.) How do you hash on the (ever growing) table size? > The hash function returns a number in a range much larger than the > possible number of buckets. 64 bits is a good choice today. > > To determine the bucket you need to reduce this number to something in > the range [0, nr_buckets). This is where modulo comes in: > > i = h % nr_buckets > > If the the table fills up, you increase nr_buckets, reallocate and > rehash all entries. Ouch. Response time on a big table would take a serious hit if that rehash happened in the middle of the day on a big OLTP system. Even worse if it were a 24x365 system, because you couldn't schedule an enlargement/rehash during a down period. -- Angular momentum makes the world go 'round.
On 2019-05-18 19:16:19 -0500, Ron wrote: > On 5/18/19 5:39 PM, Peter J. Holzer wrote: > > On 2019-05-18 17:14:59 -0500, Ron wrote: > > > On 5/18/19 3:49 PM, Peter J. Holzer wrote: > > > On 2019-05-18 15:19:22 -0500, Ron wrote: > > > On 5/18/19 2:27 PM, Peter J. Holzer wrote: > > > On 2019-05-18 10:49:53 -0700, David G. Johnston wrote: > > > > > > You don’t perform math on a hash > > > > > > That's not generally true. Hashes are used for further computation for > > > example in hash tables or in cryptography. > > > > > > How is it "using math" to use a hash key in a hash lookup table? > > > > > > hash modulo table size. > > > > > > > > > I've seen that used when the tablespace is pre-allocated, and you hash modulo > > > the tablespace page number. (Yes, performance tanks when you start filling up > > > pages.) How do you hash on the (ever growing) table size? > > The hash function returns a number in a range much larger than the > > possible number of buckets. 64 bits is a good choice today. > > > > To determine the bucket you need to reduce this number to something in > > the range [0, nr_buckets). This is where modulo comes in: > > > > i = h % nr_buckets > > > > If the the table fills up, you increase nr_buckets, reallocate and > > rehash all entries. > > Ouch. Response time on a big table would take a serious hit if that rehash > happened in the middle of the day on a big OLTP system. So that might be a reason not to use hash indexes tables where the worst case insert time is a concern (the average insert time is still O(1)). But please be aware that I answered your question 'How is it "using math" to use a hash key?', not 'How are hash indexes in PostgreSQL implemented?'. So my answer covered the most simple and generic implementation. You can split buckets lazily, and in fact PostgreSQL does this. I just looked at the code briefly, but I don't really understand how it works. Guess I would have to read Margo Seltzer's paper paper for that. Still, even with that optimization (or tradeoff - it may make the lookup slower) the README notes that splits are expensive. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Вложения
On 5/19/19 5:43 AM, Peter J. Holzer wrote: [snip] > But please be aware that I answered your question 'How is it "using > math" to use a hash key?', not 'How are hash indexes in PostgreSQL > implemented?'. So my answer covered the most simple and generic > implementation. I understand. -- Angular momentum makes the world go 'round.
Greetings, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Thu, May 16, 2019 at 8:31 AM Daulat Ram <Daulat.Ram@exponential.com> > wrote: > > > url_hash | bigint | | not null | > > Change the type of url_hash; make it text instead of bigint. Making it text wastes a bunch of space actually, since it's really a binary value. I tend to recommend using a bytea for storing hashes. Given that hashes are really fixed width, it'd be nice if we had a set of proper datatypes for them, perhaps not so much to avoid the 1-byte overhead from storing as a variable-length bytea, but because we could then avoid having a 7-byte hole due to padding if the hash is followed by a bigint or such. Thanks, Stephen
Вложения
Greetings, * Peter J. Holzer (hjp-pgsql@hjp.at) wrote: > On 2019-05-18 19:16:19 -0500, Ron wrote: > > > If the the table fills up, you increase nr_buckets, reallocate and > > > rehash all entries. > > > > Ouch. Response time on a big table would take a serious hit if that rehash > > happened in the middle of the day on a big OLTP system. As noted below, that isn't actually how it works with PG's hash indexes. > So that might be a reason not to use hash indexes tables where the worst > case insert time is a concern (the average insert time is still O(1)). The worst-case insert time is certainly worse than the average but it's not nearly as bad as being imagined here. > You can split buckets lazily, and in fact PostgreSQL does this. I just > looked at the code briefly, but I don't really understand how it works. > Guess I would have to read Margo Seltzer's paper paper for that. > > Still, even with that optimization (or tradeoff - it may make the lookup > slower) the README notes that splits are expensive. There's multiple different levels, really, from "best case" where the new item can just be placed directly on to a page that has free space, to "slightly worse case" where an overflow page has to be added, to "even worse case" where a prior page split has to be finished, to "probably worst case" where a full page split has to happen. There might even be some pathological cases where a prior page split has to be finished and then an overflow page has to be added and then another page split has to be done, or some such. Even so, for equality-based lookups (where the data set has few or zero duplicates), hash indexes can work quite well, but it's of course good to understand that inserts aren't always going to be exactly the same speed every time. Thanks, Stephen