Обсуждение: JDBC/Stored procedure performance issue
Hi All,
I am experiencing a strange performance issue with Postgresql (7.4.19)
+ PostGIS. (I posted to the PostGIS list but got no response, so am
trying here.)
We have a table of entries that contains latitude, longitude values
and I have a simple query to retrieve all entries within a specified 2-
D box.
The latitude, longitude are stored as decimals, plus a trigger stores
the corresponding geometry object.
When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it
executes in a reasonable 159ms:
EXPLAIN ANALYZE SELECT DISTINCT latitude, longitude, color FROM
NewEntries
WHERE groupid = 57925 AND
location @ SetSRID(MakeBox2D(SetSRID(MakePoint(-123.75,
36.597889), 4326),
SetSRID(MakePoint(-118.125,
40.979898), 4326)), 4326);
Unique (cost=23.73..23.74 rows=1 width=30) (actual
time=143.648..156.081 rows=3261 loops=1)
-> Sort (cost=23.73..23.73 rows=1 width=30) (actual
time=143.640..146.214 rows=3369 loops=1)
Sort Key: latitude, longitude, color
-> Index Scan using group_index on newentries
(cost=0.00..23.72 rows=1 width=30) (actual time=0.184..109.346
rows=3369 loops=1)
Index Cond: (groupid = 57925)
Filter: ("location" @
'0103000020E610000001000000050000000000000000F05EC0000000A0874C42400000000000F05EC0000000406D7D44400000000000885DC0000000406D7D44400000000000885DC0000000A0874C42400000000000F05EC0000000A0874C4240
'::geometry)
Total runtime: 159.430 ms
(7 rows)
If I issue the same query over JDBC or use a PSQL stored procedure, it
takes over 3000 ms, which, of course is unacceptable!
Function Scan on gettilelocations (cost=0.00..12.50 rows=1000
width=30) (actual time=3311.368..3319.265 rows=3261 loops=1)
Total runtime: 3322.529 ms
(2 rows)
The function gettilelocations is defined as:
CREATE OR REPLACE FUNCTION GetTileLocations(Integer, real, real, real,
real)
RETURNS SETOF TileLocation
AS
'
DECLARE
R TileLocation;
BEGIN
FOR R IN SELECT DISTINCT latitude, longitude, color FROM
NewEntries
WHERE groupid = $1 AND
location @ SetSRID(MakeBox2D(SetSRID(MakePoint($2,
$3), 4326),
SetSRID(MakePoint($4, $5), 4326)),
4326) LOOP
RETURN NEXT R;
END LOOP;
RETURN;
END;
'
LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT;
Can someone please tell me what we are doing wrong? Any help would be
greatly appreciated.
Thanks
Claire
--
Claire McLister mclister@zeesource.net
21060 Homestead Road Suite 150
Cupertino, CA 95014 408-733-2737(fax)
http://www.zeemaps.com
Claire McLister <mclister@zeesource.net> writes:
> When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it
> executes in a reasonable 159ms:
> ...
> If I issue the same query over JDBC or use a PSQL stored procedure, it
> takes over 3000 ms, which, of course is unacceptable!
I suspect that the problem is with "groupid = $1" instead of
"groupid = 57925". The planner is probably avoiding an indexscan
in the parameterized case because it's guessing the actual value will
match so many rows as to make a seqscan faster. Is the distribution
of groupid highly skewed? You might get better results if you increase
the statistics target for that column.
Switching to something newer than 7.4.x might help too. 8.1 and up
support "bitmap" indexscans which work much better for large numbers
of hits, and correspondingly the planner will use one in cases where
it wouldn't use a plain indexscan.
regards, tom lane
Hi Tom,
Is there any way to work out what plan the query is using in side the function? I think I have a similar problem with a query taking much longer from inside a function than it does as a select statement.
Regards
Matthew
Tom Lane wrote:
Is there any way to work out what plan the query is using in side the function? I think I have a similar problem with a query taking much longer from inside a function than it does as a select statement.
Regards
Matthew
Tom Lane wrote:
Claire McLister <mclister@zeesource.net> writes:When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it executes in a reasonable 159ms: ... If I issue the same query over JDBC or use a PSQL stored procedure, it takes over 3000 ms, which, of course is unacceptable!I suspect that the problem is with "groupid = $1" instead of "groupid = 57925". The planner is probably avoiding an indexscan in the parameterized case because it's guessing the actual value will match so many rows as to make a seqscan faster. Is the distribution of groupid highly skewed? You might get better results if you increase the statistics target for that column. Switching to something newer than 7.4.x might help too. 8.1 and up support "bitmap" indexscans which work much better for large numbers of hits, and correspondingly the planner will use one in cases where it wouldn't use a plain indexscan. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly _____________________________________________________________________ This e-mail has been scanned for viruses by Verizon Business Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.verizonbusiness.com/uk
Matthew Lunnon <mlunnon@rwa-net.co.uk> writes:
> Is there any way to work out what plan the query is using in side the
> function? I think I have a similar problem with a query taking much
> longer from inside a function than it does as a select statement.
Standard approach is to PREPARE a statement that has parameters in the
same places where the function uses variables/parameters, and then use
EXPLAIN [ANALYZE] EXECUTE to test it.
regards, tom lane
Thanks, Tom. Looks like that was the issue. I changed the function to use groupid = 57925 instead of groupid = $1 (I can do the same change in the JDBC prepare statement), and the performance is much better. It is still more than twice that of the simple query: 401.111 ms vs. 155.544 ms, which, however, is more acceptable than 3000ms. Will upgrade to 8.1 at some point, but would like to get reasonable performance with 7.4 until then. I did increase the statistics target to 1000. Claire On Jan 28, 2008, at 12:51 PM, Tom Lane wrote: > Claire McLister <mclister@zeesource.net> writes: >> When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it >> executes in a reasonable 159ms: >> ... >> If I issue the same query over JDBC or use a PSQL stored procedure, >> it >> takes over 3000 ms, which, of course is unacceptable! > > I suspect that the problem is with "groupid = $1" instead of > "groupid = 57925". The planner is probably avoiding an indexscan > in the parameterized case because it's guessing the actual value will > match so many rows as to make a seqscan faster. Is the distribution > of groupid highly skewed? You might get better results if you > increase > the statistics target for that column. > > Switching to something newer than 7.4.x might help too. 8.1 and up > support "bitmap" indexscans which work much better for large numbers > of hits, and correspondingly the planner will use one in cases where > it wouldn't use a plain indexscan. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly