Re: Simple query, 10 million records...MySQL ten times faster

Поиск
Список
Период
Сортировка
От Alexander Staubo
Тема Re: Simple query, 10 million records...MySQL ten times faster
Дата
Msg-id 88daf38c0704261612o741b67bvd380484cb38b814d@mail.gmail.com
обсуждение исходный текст
Ответ на Simple query, 10 million records...MySQL ten times faster  (zardozrocks <zardozrocks@gmail.com>)
Ответы Re: Simple query, 10 million records...MySQL ten times faster  ("Alexander Staubo" <alex@purefiction.net>)
Список pgsql-performance
On 24 Apr 2007 14:26:46 -0700, zardozrocks <zardozrocks@gmail.com> wrote:
> I have this table:
>
> CREATE TABLE test_zip_assoc (
>     id serial NOT NULL,
>     f_id integer DEFAULT 0 NOT NULL,
>     lat_radians numeric(6,5) DEFAULT 0.00000 NOT NULL,
>     long_radians numeric(6,5) DEFAULT 0.00000 NOT NULL
> );
> CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians);
> CREATE INDEX long_radians ON test_zip_assoc USING btree
> (long_radians);

This is a spatial search -- B-tree indexes are much less efficient
than this than certain other data structures. The R-tree and its many
variants are based on subdividing the space in regions, allowing you
to do efficient checks on containment, intersection, etc., based on
points or bounding boxes.

PostgreSQL implements R-trees natively as well as through a mechanism
called GiST, a framework for implementing pluggable tree-like indexes.
It also provides some geometric data types. However, as far as I know,
PostgreSQL's R-tree/GiST indexes do *not* provide the operators to do
bounding box searches. For this you need PostGIS.

PostGIS implements the whole GIS stack, and it's so good at this that
it's practically the de facto tool among GIS analysts. Installing
PostGIS into a database is simple, and once you have done this, you
can augment your table with a geometry (*):

  alter table test_zip_assoc add column lonlat geometry;
  update test_zip_assoc set lonlat = makepoint(
    long_radians / (3.14159265358979 / 180),
    lat_radians / (3.14159265358979 / 180));

The division is to convert your radians into degrees; PostGIS works
with degrees, at least out of the box.

Now you can query on a bounding box (although, are you sure you got
your lons and lats in order? That's Antarctica, isn't it?):

  select * from test_zip_assoc
  where lonlat && makebox2d(
    makepoint(-77.6049721697096, 39.5425768302107),
    makepoint(-76.1592790300818,  40.9882699698386))

This is bound to be blazingly fast. Next you can order by geographic
distance if you like:

  order by distance_sphere(lonlat,
    makepoint(-77.6049721697096, 39.5425768302107))

Nobody has mentioned PostGIS so far, so I hope I'm not missing some
crucial detail, like "no spatial indexes allowed!".

(*) I cheated here. The PostGIS manual recommends that you use a
function to create geometric column, because it will set up some
auxilary data structures for you that are needed for certain
operations. The recommended syntax is:

  select AddGeometryColumn('', 'test_zip_assoc', 'geom', -1, 'POINT', 2);

Alexander.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: postgres: 100% CPU utilization
Следующее
От: "Alexander Staubo"
Дата:
Сообщение: Re: Simple query, 10 million records...MySQL ten times faster