Обсуждение: Any ideas how can I speed up this query?

Поиск
Список
Период
Сортировка

Any ideas how can I speed up this query?

От
Priyank Tiwari
Дата:

Hi,

I have following table definition with 6209888 rows in it. It stores the occurrences of species in various regions.

TABLE DEFINITION

    Column    |          Type          |                        Modifiers                         

--------------+------------------------+----------------------------------------------------------

 id           | integer                | not null default nextval('occurrences_id_seq'::regclass)

 gbifid       | integer                | not null

 sname        | character varying(512) | 

 cname        | character varying(512) | 

 species      | character varying(512) | 

 location     | geometry               | not null

 month        | integer                | 

 year         | integer                | 

 event_date   | date                   | 

 dataset_key  | character varying(512) | 

 taxon_key    | character varying(512) | 

 taxon_rank   | character varying(512) | 

 record_basis | character varying(512) | 

 category_id  | integer                | 

 country      | character varying(512) | 

 lat          | double precision       | 

 lng          | double precision       | 

Indexes:

    "occurrences_pkey" PRIMARY KEY, btree (id)

    "unique_occurrences_gbifid" UNIQUE, btree (gbifid)

    "index_occurences_taxon_key" btree (taxon_key)

    "index_occurrences_category_id" btree (category_id)

    "index_occurrences_cname" btree (cname)

    "index_occurrences_country" btree (country)

    "index_occurrences_lat" btree (lat)

    "index_occurrences_lng" btree (lng)

    "index_occurrences_month" btree (month)

    "index_occurrences_sname" btree (sname)

    "occurrence_location_gix" gist (location)


I am trying to fetch the count of number of occurrences within a certain region. I save the location of each occurrence as a geometric field as well as lat, lng combination. Both fields are indexed. The query that is issued is as follows.

QUERY

SELECT COUNT(*) FROM "occurrences" WHERE ("lat" >= -27.91550355958 AND "lat" <= -27.015680440420002 AND "lng" >= 152.13307044728307 AND "lng" <= 153.03137355271693 AND "category_id" = 1 AND (ST_Intersects( ST_Buffer(ST_PointFromText('POINT(152.582222 -27.465592)')::geography, 50000)::geography, location::geography)));

The problem is it takes more than acceptable time to execute the query. Below is the explain analyze output for the same query.

EXPLAIN ANALYZE QUERY OUTPUT  (http://explain.depesz.com/s/p2a)

Aggregate  (cost=127736.06..127736.07 rows=1 width=0) (actual time=13491.678..13491.679 rows=1 loops=1)

   Buffers: shared hit=3 read=56025

   ->  Bitmap Heap Scan on occurrences  (cost=28249.46..127731.08 rows=1995 width=0) (actual time=528.053..13388.458 rows=167511 loops=1)

         Recheck Cond: ((lat >= (-27.91550355958)::double precision) AND (lat <= (-27.01568044042)::double precision) AND (lng >= 152.133070447283::double precision) AND (lng <= 153.031373552717::double precision))

         Rows Removed by Index Recheck: 748669

         Filter: ((category_id = 1) AND ('0103000020E6100000010000002100000090D8AD28D32263403905504558773BC0CADDAF0384226340E7AD43F4E38D3BC0B559D93A98216340B7BE554692A33BC0C904C18C18206340DF8EA9338DB73BC052F75181131E6340A1D9E30E0FC93BC00BDCB5E39C1B6340A1A40E496AD73BC074D30D03CD1863405DD7BF5110E23BC05DD3A2C0BF156340439B784797E83BC078E9287593126340EF9E5C37BEEA3BC072EB40B9670F63409E25A1BA6FE83BC06964481F5C0C6340B331B5D2C2E13BC08F6785ED8E0963409979FBF9F9D63BC0135DB3E71B0763402F78807480C83BC0E321E8351B0563405E96CB00E6B63BC0672CD874A00363403BC84B1BD9A23BC018FAE8F8B90263400314CD15208D3BC039DE8C4A70026340653B324F91763BC0F5BA5CDFC502634086322DDE0A603BC0E90E8A10B7036340C5FA1C046A4A3BC01ECCC14C3A056340CE4011BC82363BC022F38481400763407655DBB517253BC05B3B5AB6B50963404C9AA306D3163BC079EF01D3810C634010A732D03F0C3BC05152F188890F634044CE5D16C5053BC0EDDABA57AF126340926E14EFA1033BC08D7E9CA3D41563401EFB9E2DEB053BC071A929D5DA186340A3F1F29C8A0C3BC049A7E478A41B63404F8BD2CF3F173BC04B409855161E634057CC1080A2253BC0F9C65E70182063404BEB146926373BC0349D83F596216340449EEA7C204B3BC07803D7FD82226340A16F1747CD603BC090D8AD28D32263403905504558773BC0'::geography && (location)::geography) AND (_st_distance('0103000020E6100000010000002100000090D8AD28D32263403905504558773BC0CADDAF0384226340E7AD43F4E38D3BC0B559D93A98216340B7BE554692A33BC0C904C18C18206340DF8EA9338DB73BC052F75181131E6340A1D9E30E0FC93BC00BDCB5E39C1B6340A1A40E496AD73BC074D30D03CD1863405DD7BF5110E23BC05DD3A2C0BF156340439B784797E83BC078E9287593126340EF9E5C37BEEA3BC072EB40B9670F63409E25A1BA6FE83BC06964481F5C0C6340B331B5D2C2E13BC08F6785ED8E0963409979FBF9F9D63BC0135DB3E71B0763402F78807480C83BC0E321E8351B0563405E96CB00E6B63BC0672CD874A00363403BC84B1BD9A23BC018FAE8F8B90263400314CD15208D3BC039DE8C4A70026340653B324F91763BC0F5BA5CDFC502634086322DDE0A603BC0E90E8A10B7036340C5FA1C046A4A3BC01ECCC14C3A056340CE4011BC82363BC022F38481400763407655DBB517253BC05B3B5AB6B50963404C9AA306D3163BC079EF01D3810C634010A732D03F0C3BC05152F188890F634044CE5D16C5053BC0EDDABA57AF126340926E14EFA1033BC08D7E9CA3D41563401EFB9E2DEB053BC071A929D5DA186340A3F1F29C8A0C3BC049A7E478A41B63404F8BD2CF3F173BC04B409855161E634057CC1080A2253BC0F9C65E70182063404BEB146926373BC0349D83F596216340449EEA7C204B3BC07803D7FD82226340A16F1747CD603BC090D8AD28D32263403905504558773BC0'::geography, (location)::geography, 0::double precision, false) < 1e-05::double precision))

         Rows Removed by Filter: 6357

         Heap Blocks: exact=29947 lossy=22601

         Buffers: shared hit=3 read=56025

         ->  BitmapAnd  (cost=28249.46..28249.46 rows=32476 width=0) (actual time=519.091..519.091 rows=0 loops=1)

               Buffers: shared read=3477

               ->  Bitmap Index Scan on index_occurrences_lat  (cost=0.00..11691.20 rows=365877 width=0) (actual time=218.999..218.999 rows=392415 loops=1)

                     Index Cond: ((lat >= (-27.91550355958)::double precision) AND (lat <= (-27.01568044042)::double precision))

                     Buffers: shared read=1444

               ->  Bitmap Index Scan on index_occurrences_lng  (cost=0.00..16557.01 rows=517658 width=0) (actual time=285.211..285.211 rows=550523 loops=1)

                     Index Cond: ((lng >= 152.133070447283::double precision) AND (lng <= 153.031373552717::double precision))

                     Buffers: shared read=2033

 Planning time: 2.812 ms

 Execution time: 13493.617 ms

(19 rows)


It seems that the planner is underestimating the number of rows returned in Bitmap Heap Scan on occurrences. I have run vacuum analyze on this table couple of times, but it still produces the same result. Any idea how I can speed up this query? How I can assist planner in providing better row estimates for Bitmap Heap Scan section?

POSTGRESQL VERSION INFO

                                               version                                                

------------------------------------------------------------------------------------------------------

 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit


HARDWARE

I am running the Postgresql instance on a digital ocean vm with 1 core, SSD disk and 1 GB of ram.


Appreciate your help.

Thanks,
Priyank

Re: Any ideas how can I speed up this query?

От
"Graeme B. Bell"
Дата:
>
> QUERY
>
> SELECT COUNT(*) FROM "occurrences" WHERE ("lat" >= -27.91550355958 AND "lat" <= -27.015680440420002 AND "lng" >=
152.13307044728307AND "lng" <= 153.03137355271693 AND "category_id" = 1 AND (ST_Intersects(
ST_Buffer(ST_PointFromText('POINT(152.582222-27.465592)')::geography, 50000)::geography, location::geography))); 

> How I can assist planner in providing better row estimates for Bitmap Heap Scan section?

By googling this phrase from your EXPLAIN:   "Rows Removed by Index Recheck: 748669"  - you can find this explanation:

http://stackoverflow.com/questions/26418715/postgresql-rows-removed-by-index

"The inner Bitmap Index Scan node is producing a bitmap, putting 1 to all the places where records that match your
searchkey are found, and 0 otherwise. As your table is quite big, the size of the bitmap is getting bigger, then
availablememory for these kind of operations, configured via work_mem, becomes small to keep the whole bitmap. 

When in lack of a memory, inner node will start producing 1 not for records, but rather for blocks that are known to
containmatching records. This means, that outer node Bitmap Heap Scan has to read all records from such block and
re-checkthem. Obiously, there'll be some non-matching ones, and their number is what you see as Rows Removed by Index
Recheck."

Therefore, try substantially increasing your work_mem (use set.....  so that it's on a per-session basis, not global)
sothat you don't have to read in all the rows to re-check them. 
This is why Googling phrases from your explain before list-posting is always a good idea :-)

BTW - what are your statistics set to? If you have a huge table, it can be worth raising them from the default.
  http://www.postgresql.org/docs/9.4/static/planner-stats.html
ALTER TABLE SET STATISTICS, try raising this to 1000.


> POSTGRESQL VERSION INFO

For postgis-related questions, remember to also include the postgis version.

Hope this helps and good luck

Graeme Bell.

Re: Any ideas how can I speed up this query?

От
林士博
Дата:
1 GB of ram is quite small. 
I think it is worth to try creating an index on a combination of columns(lat, lng).
So that Bitmap Heap Scan would be omitted.