Обсуждение: SELECT '(1, nan, 3)'::cube;

Поиск
Список
Период
Сортировка

SELECT '(1, nan, 3)'::cube;

От
"Robert Brewer"
Дата:
Hi folks,

I'm working on a hypercube implementation in Postgres using contrib/cube

and need to insert 80,000 rows in one go from Python. Doing so with=20
INSERT, even multiple statements in one call, is pretty slow. I've been=20
investigating if using COPY is faster. It is, but there's a problem:=20
some of the cubes should include NaN. Writing:

    INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan', 3]::float[]));

...works fine. But I can't find the magic incantation to do the same=20
thing using COPY FROM. Indeed, even SELECT chokes on this:

     # SELECT '(1, nan, 3)'::cube;
     ERROR:  bad cube representation
     LINE 1: select '(1, nan, 3)'::cube;
                ^
     DETAIL:  syntax error at or near "n"

Note that the 'cube' datatype also doesn't have a 'binary' repr, and=20
COPY FROM exhibits the same problem with CSV as it does with text.

I actually found out I could trick the parser into storing 'Inf' by just

giving it a very large number, like '(1, 1e+309, 3)::cube', but I can't=20
figure out an equivalent trick for NaN.

Also note that 'nan' and 'inf' are not a problem for float[]

     # select '{1, nan, inf}'::float[];
       float8
     -----------
      {1,NaN,Infinity}
     (1 row)

...but 1e+309 raises an error for float (where it does not for cube):

     # select '{1, 1e+309, 3}'::float[];
     ERROR:  "1e+309" is out of range for type double precision
     LINE 1: select '{1, 1e+309, 3}'::float[];

So:

  1. It would be good to have cube parsing in PG behave more like float,

allowing 'nan' and 'inf', and once those are working, deny using large=20
numbers to fake the parser into returning Infinity.
  2. If anyone can think of a way to trick the cube parser in the short=20
term into returning NaN, I'd love to hear it.
  3. If anyone understands cubeparse.c better than I do and can shoot me

a quick patch, I'd appreciate it.


Robert Brewer
fumanchu@aminus.org

Re: SELECT '(1, nan, 3)'::cube;

От
Tom Lane
Дата:
"Robert Brewer" <fumanchu@aminus.org> writes:
> I'm working on a hypercube implementation in Postgres using contrib/cube

> and need to insert 80,000 rows in one go from Python. Doing so with
> INSERT, even multiple statements in one call, is pretty slow. I've been
> investigating if using COPY is faster. It is, but there's a problem:
> some of the cubes should include NaN. Writing:

>     INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan', 3]::float[]));

> ...works fine. But I can't find the magic incantation to do the same
> thing using COPY FROM.

cube_in doesn't accept either 'nan' or 'inf'.  It's probably a bug that
you can get those things into a cube value via cube(float8[]).  Or we
could see about upgrading the datatype to allow them, but that would
require looking at all its operations not just cube_in.  It seems pretty
likely to me that there are some other things in that module that won't
behave sanely with NaN, because the original author clearly never
thought about it.

I'd suggest rethinking your design to avoid needing NaN in a cube.

            regards, tom lane

Re: SELECT '(1, nan, 3)'::cube;

От
"Robert Brewer"
Дата:
Tom Lane wrote:
> "Robert Brewer" <fumanchu@aminus.org> writes:
> > I'm working on a hypercube implementation in Postgres using
contrib/cube
> > and need to insert 80,000 rows in one go from Python. Doing so with=20
> > INSERT, even multiple statements in one call, is pretty slow. I've
been=20
> > investigating if using COPY is faster. It is, but there's a problem:

> > some of the cubes should include NaN. Writing:
>
> >     INSERT INTO foo (coords) VALUES (cube(ARRAY[1, 'nan',
3]::float[]));
>
> > ...works fine. But I can't find the magic incantation to do the same

> > thing using COPY FROM.
>
> cube_in doesn't accept either 'nan' or 'inf'.  It's probably a bug
that
> you can get those things into a cube value via cube(float8[]).  Or we
> could see about upgrading the datatype to allow them, but that would
> require looking at all its operations not just cube_in.  It seems
pretty
> likely to me that there are some other things in that module that
won't
> behave sanely with NaN, because the original author clearly never
> thought about it.
>
> I'd suggest rethinking your design to avoid needing NaN in a cube.

Thanks Tom,

I could, of course, use a sentinel value like -1 or even -Inf to mean
"indeterminate", but I'd like future versions of this app (wink) to be
able to use qnan's as IEEE 754 intended. Note that cube doesn't take
NULL either. It may be surprising and underspecified, but using NaN and
Inf in other cube operations (including gist indices) does indeed work
quite well--probably because they are, after all, just floats. I believe
it's a bug to handle only some subset of values of an extremely
well-known and well-specified datatype like float, but of course I
understand if contrib modules aren't high priority.


Robert Brewer
fumanchu@aminus.org

Re: SELECT '(1, nan, 3)'::cube;

От
Tom Lane
Дата:
"Robert Brewer" <fumanchu@aminus.org> writes:
> Tom Lane wrote:
>> I'd suggest rethinking your design to avoid needing NaN in a cube.

> I could, of course, use a sentinel value like -1 or even -Inf to mean
> "indeterminate", but I'd like future versions of this app (wink) to be
> able to use qnan's as IEEE 754 intended. Note that cube doesn't take
> NULL either. It may be surprising and underspecified, but using NaN and
> Inf in other cube operations (including gist indices) does indeed work
> quite well--probably because they are, after all, just floats. I believe
> it's a bug to handle only some subset of values of an extremely
> well-known and well-specified datatype like float, but of course I
> understand if contrib modules aren't high priority.

Well, this is the sort of thing that'll only get fixed when someone who
actually uses the datatype is motivated to improve it.  IOW, if you
submitted a patch it'd likely get accepted.

            regards, tom lane

Re: SELECT '(1, nan, 3)'::cube;

От
bricklen
Дата:
On Tue, Mar 15, 2011 at 9:08 AM, Robert Brewer <fumanchu@aminus.org> wrote:
> I'm working on a hypercube implementation in Postgres using contrib/cube
>
> and need to insert 80,000 rows in one go from Python. Doing so with
> INSERT, even multiple statements in one call, is pretty slow. I've been
> investigating if using COPY is faster.

When you say "multiple statements", do you mean

INSERT INTO foo (coords) VALUES
(cube(ARRAY[1, 'nan', 3]::float[])),
(cube(ARRAY[2, 'nan', 4]::float[])),
(cube(ARRAY[3, 'nan', 5]::float[])),
(cube(ARRAY[4, 'nan', 6]::float[]));

I was going to suggest trying that method, but if you already have
then please ignore me!

Re: SELECT '(1, nan, 3)'::cube;

От
"Robert Brewer"
Дата:
YnJpY2tsZW4gd3JvdGU6DQo+IE9uIFR1ZSwgTWFyIDE1LCAyMDExIGF0IDk6
MDggQU0sIFJvYmVydCBCcmV3ZXIgPGZ1bWFuY2h1QGFtaW51cy5vcmc+DQo+
IHdyb3RlOg0KPiA+IEknbSB3b3JraW5nIG9uIGEgaHlwZXJjdWJlIGltcGxl
bWVudGF0aW9uIGluIFBvc3RncmVzIHVzaW5nDQo+IGNvbnRyaWIvY3ViZQ0K
PiA+DQo+ID4gYW5kIG5lZWQgdG8gaW5zZXJ0IDgwLDAwMCByb3dzIGluIG9u
ZSBnbyBmcm9tIFB5dGhvbi4gRG9pbmcgc28gd2l0aA0KPiA+IElOU0VSVCwg
ZXZlbiBtdWx0aXBsZSBzdGF0ZW1lbnRzIGluIG9uZSBjYWxsLCBpcyBwcmV0
dHkgc2xvdy4gSSd2ZQ0KPiBiZWVuDQo+ID4gaW52ZXN0aWdhdGluZyBpZiB1
c2luZyBDT1BZIGlzIGZhc3Rlci4NCj4gDQo+IFdoZW4geW91IHNheSAibXVs
dGlwbGUgc3RhdGVtZW50cyIsIGRvIHlvdSBtZWFuDQo+IA0KPiBJTlNFUlQg
SU5UTyBmb28gKGNvb3JkcykgVkFMVUVTDQo+IChjdWJlKEFSUkFZWzEsICdu
YW4nLCAzXTo6ZmxvYXRbXSkpLA0KPiAoY3ViZShBUlJBWVsyLCAnbmFuJywg
NF06OmZsb2F0W10pKSwNCj4gKGN1YmUoQVJSQVlbMywgJ25hbicsIDVdOjpm
bG9hdFtdKSksDQo+IChjdWJlKEFSUkFZWzQsICduYW4nLCA2XTo6ZmxvYXRb
XSkpOw0KPiANCj4gSSB3YXMgZ29pbmcgdG8gc3VnZ2VzdCB0cnlpbmcgdGhh
dCBtZXRob2QsIGJ1dCBpZiB5b3UgYWxyZWFkeSBoYXZlDQo+IHRoZW4gcGxl
YXNlIGlnbm9yZSBtZSENCg0KWWVzLCBJJ20gdXNpbmcgdGhlIGFib3ZlIG5v
dy4gTG9va2luZyBmb3Igc29tZXRoaW5nIGZhc3Rlci4gSSdsbCBwcm9iYWJs
eSBzZXR0bGUgb24gdXNpbmcgLUluZiBmb3IgdGhlIHNob3J0IHRlcm0gc28g
SSBjYW4gdXNlIENPUFkuDQoNCg0KQm9iDQo=