Re: Strange behavior with polygon and NaN

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Strange behavior with polygon and NaN
Дата
Msg-id 20200826.172525.155200749215780647.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Strange behavior with polygon and NaN  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: Strange behavior with polygon and NaN  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian <bruce@momjian.us> wrote in 
> 
> I can confirm that this two-month old email report still produces
> different results with indexes on/off in git master, which I don't think
> is ever correct behavior.

I agree to that the behavior is broken. 

> ---------------------------------------------------------------------------
> 
> On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote:
> > Hi hackers,
> > 
> > While working with Chris Hajas on merging Postgres 12 with Greenplum
> > Database we stumbled upon the following strange behavior in the geometry
> > type polygon:
> > 
> > ------ >8 --------
> > 
> > CREATE TEMP TABLE foo (p point);
> > CREATE INDEX ON foo USING gist(p);
> > 
> > INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
> > 
> > SELECT $q$
> > SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> > $q$ AS qry \gset
> > 
> > BEGIN;
> > SAVEPOINT yolo;
> > SET LOCAL enable_seqscan TO off;
> > :qry;
> > 
> > ROLLBACK TO SAVEPOINT yolo;
> > SET LOCAL enable_indexscan TO off;
> > SET LOCAL enable_bitmapscan TO off;
> > :qry;
> > 
> > ------ 8< --------
> > 
> > If you run the above repro SQL in HEAD (and 12, and likely all older
> > versions), you get the following output:
> > 
> > CREATE TABLE
> > CREATE INDEX
> > INSERT 0 3
> > BEGIN
> > SAVEPOINT
> > SET
> >    p
> > -------
> >  (0,0)
> >  (1,1)
> > (2 rows)
> > 
> > ROLLBACK
> > SET
> > SET
> >      p
> > -----------
> >  (0,0)
> >  (1,1)
> >  (NaN,NaN)
> > (3 rows)
> > 
> > 
> > At first glance, you'd think this is the gist AM's bad, but on a second
> > thought, something else is strange here. The following query returns
> > true:
> > 
> > SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> > 
> > The above behavior of the "contained in" operator is surprising, and
> > it's probably not what the GiST AM is expecting. I took a look at
> > point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
> > NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
> > distnace operator for polygon. It gives the following interesting
> > output:
> > 
> > SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
> > FROM (
> >   SELECT circle(point(100 * i, 'NaN'), 50) AS c
> >   FROM generate_series(-2, 4) i
> > ) t(c)
> > ORDER BY 2;
> > 
> >         c        | distance
> > -----------------+----------
> >  <(-200,NaN),50> |        0
> >  <(-100,NaN),50> |        0
> >  <(0,NaN),50>    |        0
> >  <(100,NaN),50>  |        0
> >  <(200,NaN),50>  |      NaN
> >  <(300,NaN),50>  |      NaN
> >  <(400,NaN),50>  |      NaN
> > (7 rows)
> > 
> > Should they all be NaN? Am I alone in thinking the index is right but
> > the operators are wrong? Or should we call the indexes wrong here?


There may be other places that NaN is not fully considered. For
example, the following (perpendicular op) returns true.

SELECT lseg '[(0,0),(0,NaN)]' ?-| lseg '[(0,0),(10,0)]';

It is quite hard to fix all of the defect..

For the above cases, it's enough to make sure that point_inside don't
pass NaN's to lseg_crossing(), but it's much of labor to fill all
holes reasonable and principled way..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index a7db783958..1875f8d436 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -5350,6 +5350,9 @@ point_inside(Point *p, int npts, Point *plist)
     x0 = float8_mi(plist[0].x, p->x);
     y0 = float8_mi(plist[0].y, p->y);
 
+    if (isnan(x0) || isnan(y0) || isnan(p->x) || isnan(p->y))
+        return 0;
+
     prev_x = x0;
     prev_y = y0;
     /* loop over polygon points and aggregate total_cross */
@@ -5359,6 +5362,9 @@ point_inside(Point *p, int npts, Point *plist)
         x = float8_mi(plist[i].x, p->x);
         y = float8_mi(plist[i].y, p->y);
 
+        if (isnan(x) || isnan(y))
+            return 0;
+
         /* compute previous to current point crossing */
         if ((cross = lseg_crossing(x, y, prev_x, prev_y)) == POINT_ON_POLYGON)
             return 2;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Re: Improve planner cost estimations for alternative subplans
Следующее
От: John Naylor
Дата:
Сообщение: Re: Problems with the FSM, heap fillfactor, and temporal locality