Re: Unexpected interval comparison

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Unexpected interval comparison
Дата
Msg-id 20170405.200723.102041631.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Unexpected interval comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Unexpected interval comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Mmm. It's shameful.

At Tue, 04 Apr 2017 18:06:38 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <5084.1491343598@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > The first attached is the revised patch and the second is
> > temporary sanity check code for non-128bit environment code. (but
> > works only on 128 bit environment)
>
> This seemed to me to be probably even less correct, so I extracted
> the addition and multiplication logic into a standalone test program
> (attached), which compares the result of a multiplication to that
> of native int128 arithmetic.  I changed the order of the LinearInterval
> fields to be LS-first so that I could overlay them onto an int128
> result (on a little-endian machine); this is just for testing purposes
> not something we must do in the finished code.  I soon found cases
> where it indeed fails, eg
>
> $ ./a.out 0x7ffffffff 0x7ffffffff
> 7FFFFFFFF * 7FFFFFFFF
> result = 62 18446744004990074881
> result = 3E FFFFFFF000000001
> MISMATCH!
> result = 63 18446744004990074881
> result = 3F FFFFFFF000000001

I admit that I was careless about that.

> After fooling with it for awhile, I decided that the cause of the
> problems was basically not thinking carefully about the lower half
> of the value being unsigned: that affects when to do carries in
> the addition macro, and we also have to be careful about whether
> or not to sign-extend the partial product terms.  The second
> attached file is a version that I can't break anymore, though I'm
> not quite sure it's bug-free.

In the first version, I converted all operands to positive
numbers and finally correct the sign of the result. This was
straightforward but doesn't looked clean so I tried direct
handling of singed values. It was the beginning of this mess.

There was a time when the lower bits is in uint64 but I was
confused by arithmetics mixing signed and unsigned. I realized
that I misunderstood composing 64 bit value from decomposition of
a negative int64.


I reconsidered this based on Tom's test program and try to give
reasoning for the algorithm of the attached new version.

1. Now INT64_AL32 returns explicitly casted into uint64 (for
  safety).  the upper and lower 32 bit values surely makes the
  original value just by INT64_AU32 << 32 + INT64_AL32 because

  1.1. The arithmetic is done assuming that both operands of the
     addition are in signed int64 but the MSB of the right
     operand is always 0 so no difference by reading it as
     singed.

- As mentioned in added comment, all terms (stored in the
  variable tmp) are the products of two signed/unsigned 32-bit
  values expanded to singed int64 variables. This is safe.

- The second and third terms should be left-shifted by 32 bit in
  virtually-128-bit storage then added to exiting 128 bit
  value. This can be said as adding INT128_AU64(tmp<<32) into hi
  part. If INT128_AU64(tmp<<32) is equivalent to
  INT64_AU32(tmp)>>32, "span.hi += INT64_AU32(tmp)" is safe.

   INT128_AU64(tmp << 32)  ; tmp is assumed signed int128 here
    = ((tmp << 32) >> 64)
    = tmp >>32
    = INT64_AU32(tmp)      ; here, tmp is safe even if singed int64

  Similary,

   INT128_AL64(tmp << 32)
    = (uint128)(tmp << 32) & 0xFFFFFFFF_FFFFFFFF  (lower 32 bits are always 0)
    = ((uint64)(tmp) & 0xFFFFFFFF) << 32
    = INT64_AL32(tmp) << 32

- The last thing I should confirm is that
  LINEARINTERVAL_ADD_UINT64 is correct. This is analogous to 1
  above.

    (int128)x + (uint64)y
  = (int128)x + (int128)y   ; safely extended without change
  =  (INT128AU64(x) << 64 + INT128AL64(x))
   + (INT128AU64(y) << 64 + INT128AL64(y))
  =  (INT128AU64(x) + INT128AU64(y)) << 64  ; performed as signed
   + (INT128AL64(x) + INT128AL64(y))        ; performed as unsigned

  Where (INT128AL64(x) + INT128AL64(y)) is performed as unsigned
  and can be overflow, and the carry can be just pushed to the
  upper 64bit.

  2. Adding two values with MSB of 0 doesn't overflow.

  3. If at least one of the two has MSB of 1, it can be overflow.

  3.1. Both have MSB of 1, it must overflow.

  3.2. If one of the two has MSB of 1, the maximum overflowed
     value is made by 0xFFF...FF + 0x7FF...FF and result is
     0x(1)7FF..FE so "MSB of the result is 0" and "overflowed" is
     equivalent for the case.

Addition to all of the above, dayfraction should be positive so
that LINEARINTERVAL_ADD_UINT64 can be used.

The attached patch is the revised version.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-general по дате отправления:

Предыдущее
От: Albe Laurenz
Дата:
Сообщение: Re: expensive function in select list vs limit clause
Следующее
От: Bill Moran
Дата:
Сообщение: Re: Keycloak and Postgres