Обсуждение: INET datatype and '<<' operator results in bad row estimate

Поиск
Список
Период
Сортировка

INET datatype and '<<' operator results in bad row estimate

От
Mike Porter
Дата:
Postgres version: 9.3.4

Note: This behavior appears to have been fixed in 9.4-beta1.  I am
reporting it in case it can be fixed in 9.3.x.  Perhaps it should be
mentioned in the 9.4.x release notes (or did I miss it?)

A simple query of the form:

mike=# explain select * from swmon_internal.arpmon a join swmon_internal.macaddr_ip mi on( a.macaddr_ip = mi.id )
whereip << '128.175.10.0/25'; 
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=5215.15..112839.99 rows=1591378 width=53)
    Hash Cond: (a.macaddr_ip = mi.id)
    ->  Seq Scan on arpmon a  (cost=0.00..59883.53 rows=3182753 width=32)
    ->  Hash  (cost=8.45..8.45 rows=416536 width=21)
          ->  Index Scan using macaddr_ip_ip_macaddr_index on macaddr_ip mi  (cost=0.42..8.45 rows=416536 width=21)
                Index Cond: ((ip > '128.175.10.0/25'::inet) AND (ip <= '128.175.10.127'::inet))
                Filter: (ip << '128.175.10.0/25'::inet)

Results in a bad row guess (416536) and hence a slower plan is
chosen since postgres is expecting far too many rows.  The number
chosen is usually 1/2 the number of rows in the macaddr_ip table, by
the way.

Notice that if I manually add the index condition to the query:

mike=# explain select * from swmon_internal.arpmon a join swmon_internal.macaddr_ip mi on( a.macaddr_ip = mi.id )
whereip << '128.175.10.0/25' and ((ip > '128.175.10.0/25'::inet) AND (ip <= '128.175.10.127'::inet)); 
                                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.85..97.81 rows=4 width=53)
    ->  Index Scan using macaddr_ip_ip_macaddr_index on macaddr_ip mi  (cost=0.42..8.45 rows=1 width=21)
          Index Cond: ((ip > '128.175.10.0/25'::inet) AND (ip <= '128.175.10.127'::inet) AND (ip >
'128.175.10.0/25'::inet)AND (ip <= '128.175.10.127'::inet)) 
          Filter: (ip << '128.175.10.0/25'::inet)
    ->  Index Scan using arpmon_macaddr_ip_index on arpmon a  (cost=0.43..89.13 rows=22 width=32)
          Index Cond: (macaddr_ip = mi.id)

The number of estimated rows from macaddr_ip is about right (22) and a
much better plan is used.

vacuum analyze has been done and the histogram for the table is reasonable.

mike=# \d swmon_internal.macaddr_ip
                             Table "swmon_internal.macaddr_ip"
  Column  |  Type   |                              Modifiers
---------+---------+---------------------------------------------------------------------
  id      | bigint  | not null default nextval('swmon_internal.macaddr_ip_seq'::regclass)
  macaddr | macaddr | not null
  ip      | inet    | not null
Indexes:
     "macaddr_ip_id_key" UNIQUE CONSTRAINT, btree (id)
     "macaddr_ip_ip_macaddr_index" UNIQUE, btree (ip, macaddr)
     "macaddr_ip_host_ip_index" btree (host(ip))
     "macaddr_ip_macaddr_index" btree (macaddr)
Referenced by:
     TABLE "swmon_internal.arpmon" CONSTRAINT "arpmon_macaddr_ip_fkey" FOREIGN KEY (macaddr_ip) REFERENCES
swmon_internal.macaddr_ip(id)ON UPDATE CASCADE 

mike=# \d swmon_internal.arpmon
                                       Table "swmon_internal.arpmon"
    Column   |           Type           |                            Modifiers
------------+--------------------------+-----------------------------------------------------------------
  id         | bigint                   | not null default nextval('swmon_internal.arpmon_seq'::regclass)
  macaddr_ip | bigint                   | not null
  start      | timestamp with time zone | not null default now()
  finish     | timestamp with time zone | not null default now()
Indexes:
     "arpmon_id_key" UNIQUE CONSTRAINT, btree (id)
     "arpmon_finish_index" btree (finish)
     "arpmon_macaddr_ip_index" btree (macaddr_ip)
     "arpmon_start_index" btree (start)
Foreign-key constraints:
     "arpmon_macaddr_ip_fkey" FOREIGN KEY (macaddr_ip) REFERENCES swmon_internal.macaddr_ip(id) ON UPDATE CASCADE
Referenced by:
     TABLE "swmon_internal.link_swmon_arpmon" CONSTRAINT "link_swmon_arpmon_arpmon_id_fkey" FOREIGN KEY (arpmon_id)
REFERENCESswmon_internal.arpmon(id) ON UPDATE CASCADE 

Again, this works correctly in 9.4-beta1.

Thanks,

Mike

-
Mike Porter
PGP Fingerprint: F4 AE E1 9F 67 F7 DA EA  2F D2 37 F3 99 ED D1 C2

Re: INET datatype and '<<' operator results in bad row estimate

От
Tom Lane
Дата:
Mike Porter <mike@udel.edu> writes:
> Postgres version: 9.3.4
> Note: This behavior appears to have been fixed in 9.4-beta1.  I am
> reporting it in case it can be fixed in 9.3.x.  Perhaps it should be
> mentioned in the 9.4.x release notes (or did I miss it?)

Before 9.4, inet<<inet had no selectivity estimator at all, so you got
a default selectivity estimate of 0.5.  In 9.4 there is an estimator
function, but it's only a stub returning a constant estimate of 0.001;
so that makes this example better but only by luck.  There's some work in
progress to replace the stub with real selectivity estimation, but it
won't appear in any release before 9.5:

http://www.postgresql.org/message-id/flat/CAE2gYzwNBfGe9+1RP4UU6be9q2_m7Bfe6T18JgBSV4qJrkmxEQ@mail.gmail.com

If you're interested, please help test that patch.

            regards, tom lane