Обсуждение: seq scan in the case of max() on the primary key column
Hi everybody,
I am running PostgreSQL 9.0 which performs well in most of the cases. I would skip all the parameters if these are not necessary.
I need to frequently (every min) get the max value of the primary key column on some tables, like this case which works perfectly well:
explain analyze select max(id) from appqosdata.tcpsessions;
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.49..0.50 rows=1 width=0) (actual time=45.316..45.317 rows=1 loops=1) InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.49 rows=1 width=8) (actual time=45.302..45.303 rows=1 loops=1)
-> Index Scan Backward using idx_tcpsessions_id on tcpsessions (cost=0.00..6633362.76 rows=13459023 width=8) (actual time=45.296..45.296 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Total runtime: 45.399 ms
-> Limit (cost=0.00..0.49 rows=1 width=8) (actual time=45.302..45.303 rows=1 loops=1)
-> Index Scan Backward using idx_tcpsessions_id on tcpsessions (cost=0.00..6633362.76 rows=13459023 width=8) (actual time=45.296..45.296 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Total runtime: 45.399 ms
But I have the following similar case which surprises me quite a lot:
explain analyze select max(createdtime) from appqosdata.tcpsessiondata;
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1123868.30..1123868.31 rows=1 width=8) (actual time=376932.636..376932.637 rows=1 loops=1)
-> Append (cost=0.00..965113.04 rows=63502104 width=8) (actual time=0.020..304844.944 rows=63501281 loops=1)
-> Seq Scan on tcpsessiondata (cost=0.00..12.80 rows=780 width=8) (actual time=0.002..0.002 rows=0 loops=1)
-> Seq Scan on tcpsessiondata_default tcpsessiondata (cost=0.00..965100.24 rows=63501324 width=8) (actual time=0.015..173159.505 rows=63501281 loops=1)
Total runtime: 376980.975 ms
Aggregate (cost=1123868.30..1123868.31 rows=1 width=8) (actual time=376932.636..376932.637 rows=1 loops=1)
-> Append (cost=0.00..965113.04 rows=63502104 width=8) (actual time=0.020..304844.944 rows=63501281 loops=1)
-> Seq Scan on tcpsessiondata (cost=0.00..12.80 rows=780 width=8) (actual time=0.002..0.002 rows=0 loops=1)
-> Seq Scan on tcpsessiondata_default tcpsessiondata (cost=0.00..965100.24 rows=63501324 width=8) (actual time=0.015..173159.505 rows=63501281 loops=1)
Total runtime: 376980.975 ms
I have the following table definitions:
CREATE TABLE appqosdata.tcpsessiondata_default
(
Primary key(createdtime), --bigint
check (sessionid >= 0),
Foreign key(detectorid, sessionid) References appqosdata.tcpsessions(detectorid,id)
) inherits (appqosdata.tcpsessiondata);
(
Primary key(createdtime), --bigint
check (sessionid >= 0),
Foreign key(detectorid, sessionid) References appqosdata.tcpsessions(detectorid,id)
) inherits (appqosdata.tcpsessiondata);
CREATE TABLE appqosdata.tcpsessions
(
detectorid smallint not null default(0) references appqosdata.detectors(id),
id bigint not null,
...
primary key(detectorid, id)
);
As you can see I have tens of millions of rows in both tables which would be ten times more in production. So seq scan is not acceptable at all to get one single value.
);
As you can see I have tens of millions of rows in both tables which would be ten times more in production. So seq scan is not acceptable at all to get one single value.
Why that difference and what can I do to make the first query use its index on the primary key.
Thank you,
Svetlin Manavski
On 2011-06-16 15:55, Svetlin Manavski wrote: > Hi everybody, > > I am running PostgreSQL 9.0 which performs well in most of the cases. I > would skip all the parameters if these are not necessary. > > I need to frequently (every min) get the max value of the primary key column > on some tables, like this case which works perfectly well: > > explain analyze select max(id) from appqosdata.tcpsessions; Typically this is due to "batch load" and failing to run "analyze" manually afterwards.. is this the case? -- Jesper
On Thu, Jun 16, 2011 at 15:55, Svetlin Manavski <svetlin.manavski@gmail.com> wrote: > Hi everybody, > > I am running PostgreSQL 9.0 which performs well in most of the cases. I > would skip all the parameters if these are not necessary. > I need to frequently (every min) get the max value of the primary key column > on some tables, like this case which works perfectly well: > explain analyze select max(id) from appqosdata.tcpsessions; > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Result (cost=0.49..0.50 rows=1 width=0) (actual time=45.316..45.317 rows=1 > loops=1) InitPlan 1 (returns $0) > -> Limit (cost=0.00..0.49 rows=1 width=8) (actual time=45.302..45.303 rows=1 > loops=1) > -> Index Scan Backward using idx_tcpsessions_id on tcpsessions > (cost=0.00..6633362.76 rows=13459023 width=8) (actual time=45.296..45.296 > rows=1 loops=1) > Index Cond: (id IS NOT NULL) > Total runtime: 45.399 ms > > But I have the following similar case which surprises me quite a lot: > explain analyze select max(createdtime) from appqosdata.tcpsessiondata; > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1123868.30..1123868.31 rows=1 width=8) (actual > time=376932.636..376932.637 rows=1 loops=1) > -> Append (cost=0.00..965113.04 rows=63502104 width=8) (actual > time=0.020..304844.944 rows=63501281 loops=1) > -> Seq Scan on tcpsessiondata (cost=0.00..12.80 rows=780 width=8) (actual > time=0.002..0.002 rows=0 loops=1) > -> Seq Scan on tcpsessiondata_default tcpsessiondata (cost=0.00..965100.24 > rows=63501324 width=8) (actual time=0.015..173159.505 rows=63501281 loops=1) > Total runtime: 376980.975 ms > > I have the following table definitions: > CREATE TABLE appqosdata.tcpsessiondata_default > ( > Primary key(createdtime), --bigint > check (sessionid >= 0), > > Foreign key(detectorid, sessionid) References > appqosdata.tcpsessions(detectorid,id) > > ) inherits (appqosdata.tcpsessiondata); > CREATE TABLE appqosdata.tcpsessions > ( > detectorid smallint not null default(0) references appqosdata.detectors(id), > id bigint not null, > ... > primary key(detectorid, id) > ); > > As you can see I have tens of millions of rows in both tables which would be > ten times more in production. So seq scan is not acceptable at all to get > one single value. > Why that difference and what can I do to make the first query use its index > on the primary key. Looks like the first table is not partitioned, but the second one is? PostgreSQL 9.0 is unable to use an index scan to find min/max on a partitioned table. 9.1, however, can do that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 06/16/2011 12:25 PM, Magnus Hagander wrote: > PostgreSQL 9.0 is unable to use an index scan to find min/max on a > partitioned table. 9.1, however, can do that. Unfortunately this is true. You can fake it this way though: /** * Return the Maximum INT Value for a Partitioned Table Column * * @param string Name of Schema of the base partition table. * @param string Name of the base partition table. * @param string Name of column to search. */ CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR, VARCHAR, VARCHAR) RETURNS INT AS $$ DECLARE sSchema ALIAS FOR $1; sTable ALIAS FOR $2; sColName ALIAS FOR $3; sChild VARCHAR; nMax INT; nTemp INT; nParent OID; BEGIN EXECUTE ' SELECT max(' || sColName ||') FROM ONLY ' || sSchema || '.' || quote_ident(sTable) INTO nMax; SELECT INTO nParent t.oid FROM pg_class t JOIN pg_namespace n ON (t.relnamespace=n.oid) WHERE n.nspname = sSchema AND t.relname = sTable; FOR sChild IN SELECT t.relname FROM pg_class t JOIN pg_inherits c ON (c.inhrelid=t.oid AND c.inhparent=nParent) LOOP nTemp := utility.spc_max_part_int(sSchema, sChild, sColName); nMax := greatest(nTemp, nMax); END LOOP; RETURN nMax; END; $$ LANGUAGE plpgsql STABLE; You can call that instead of max, and it'll be much faster. You can create an analog for min if you need it. So for this, you'd call: SELECT spc_max_part_int('appqosdata', 'tcpsessions', 'id'); Someone probably has a better solution. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
On 06/17/2011 06:22 AM, Svetlin Manavski wrote: > Shaun, that solution is brilliant. Don't thank me. I actually got the basic idea from a post here a couple years ago. The only difference is I formalized it somewhat and put it in our utility schema, where I put lots of other random useful stored procs I've accumulated over the years. I have another one that works with dates. :) I assume you already modified it by removing the 'utility' schema prefix from the recursive call. The recursive call is in case the child tables are themselves used as a template for further inheritance. It's rare, but possible. This function will always get you the max value on a column in a series of partitioned tables, and quickly so long as it's indexed. It's a bit of a hack, but it's worked fine for us while we wait for the planner to catch up. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Yes, confirmed that the problem is in the partitioned table.
Shaun, that solution is brilliant.
Thank you,
Svetlin Manavski
On Thu, Jun 16, 2011 at 7:36 PM, Shaun Thomas <sthomas@peak6.com> wrote:
On 06/16/2011 12:25 PM, Magnus Hagander wrote:Unfortunately this is true. You can fake it this way though:PostgreSQL 9.0 is unable to use an index scan to find min/max on a
partitioned table. 9.1, however, can do that.
/**
* Return the Maximum INT Value for a Partitioned Table Column
*
* @param string Name of Schema of the base partition table.
* @param string Name of the base partition table.
* @param string Name of column to search.
*/
CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR, VARCHAR, VARCHAR)
RETURNS INT AS
$$
DECLARE
sSchema ALIAS FOR $1;
sTable ALIAS FOR $2;
sColName ALIAS FOR $3;
sChild VARCHAR;
nMax INT;
nTemp INT;
nParent OID;
BEGIN
EXECUTE '
SELECT max(' || sColName ||')
FROM ONLY ' || sSchema || '.' || quote_ident(sTable)
INTO nMax;
SELECT INTO nParent t.oid
FROM pg_class t
JOIN pg_namespace n ON (t.relnamespace=n.oid)
WHERE n.nspname = sSchema
AND t.relname = sTable;
FOR sChild IN
SELECT t.relname
FROM pg_class t
JOIN pg_inherits c ON (c.inhrelid=t.oid AND c.inhparent=nParent)
LOOP
nTemp := utility.spc_max_part_int(sSchema, sChild, sColName);
nMax := greatest(nTemp, nMax);
END LOOP;
RETURN nMax;
END;
$$ LANGUAGE plpgsql STABLE;
You can call that instead of max, and it'll be much faster. You can create an analog for min if you need it. So for this, you'd call:
SELECT spc_max_part_int('appqosdata', 'tcpsessions', 'id');
Someone probably has a better solution. :)
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com
______________________________________________
See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email
On 06/17/2011 08:43 AM, Shaun Thomas wrote: > It's a bit of a hack, but it's worked fine for us while we wait for > the planner to catch up. :) Right. In situations where people can modify their application to redirect MIN/MAX() calls over to directly query the individual partitions, that's a great workaround. Your function is the slickest such solution I've seen for that, so filing it away in case this pops up in that situation. But if you can't touch the application code and just need it to work as desired, you either need to use PostgreSQL 9.1 (not yet released) or figure out how to backport that fix into an earlier version (not easy). A babbled a bit about this specific case at http://blog.2ndquadrant.com/en/2011/06/max-partitioning-with-min-pain.html if anyone wants more information, or a specific simple test case to play with. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Jun 16, 2011, at 1:36 PM, Shaun Thomas wrote: > /** > * Return the Maximum INT Value for a Partitioned Table Column > * > * @param string Name of Schema of the base partition table. > * @param string Name of the base partition table. > * @param string Name of column to search. > */ > CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR, VARCHAR, VARCHAR) > RETURNS INT AS > $$ > DECLARE > <snip> > SELECT INTO nParent t.oid > FROM pg_class t > JOIN pg_namespace n ON (t.relnamespace=n.oid) > WHERE n.nspname = sSchema > AND t.relname = sTable; FWIW, instead of that, I would do this: CREATE FUNCTION ...( p_parent_schema text , p_parent_table text ) ... DECLARE c_parent_oid CONSTANT oid := (p_parent_schema || '.' || p_parent_table )::regclass; ... or ... CREATE FUNCTION( p_parent text ) DECLARE c_parent_oid CONSTANT oid := p_parent::regclass; Advantages: - ::regclass is search_path-aware, so you're not forced into providing a schema if you don't want to - it will throw an error if it doesn't find a regclass entry - you can cast the oid back to text: EXECUTE 'SELECT max(' ... 'FROM ' || c_parent_oid::regclass - you can also query directly with the OID: SELECT relkind = 't' AS is_table FROM pg_class WHERE oid = c_parent_oid -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 06/17/2011 03:31 PM, Jim Nasby wrote: > c_parent_oid CONSTANT oid := (p_parent_schema || '.' || > p_parent_table )::regclass; Well isn't *that* a handy bit of magic. How did I not know about that? Thanks! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
On Thu, Jun 16, 2011 at 21:36, Shaun Thomas <sthomas@peak6.com> wrote: > You can call that instead of max, and it'll be much faster. You can create > an analog for min if you need it. So for this, you'd call: Cool, I've needed this function sometimes but never bothered enough to write it myself. Now I created a wiki snippet page for this handy feature here: https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table With Jim Nasby's idea to use regclass instead of relation names, the function is now half its length and probably more reliable. There's no need to touch pg_class directly at all. I also changed it to return bigint instead of integer, as that's more versatile, and the performance loss is probably negligible. Regards, Marti
On 06/22/2011 04:55 AM, Marti Raudsepp wrote: > With Jim Nasby's idea to use regclass instead of relation names, the > function is now half its length and probably more reliable. There's no > need to touch pg_class directly at all. Sadly until we upgrade to EDB 9.0, I have to use my function. :) EDB 8.3 (which is really PostgreSQL 8.2) doesn't have a regclass->text conversion. But I'll bookmark the wiki page anyway, so I can update my function after upgrading. Heh. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
On 06/22/2011 05:55 AM, Marti Raudsepp wrote: > Now I created a wiki snippet page for this handy > feature here: > https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table > I just tweaked this a bit to document the version compatibility issues around it and make it easier to follow. I think that's now the page we should point people toward when this pops up again. Between that and my blog post I reference in it, they can find all the details and a workaround in one place. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Wed, Jun 22, 2011 at 7:12 AM, Shaun Thomas <sthomas@peak6.com> wrote: > On 06/22/2011 04:55 AM, Marti Raudsepp wrote: > >> With Jim Nasby's idea to use regclass instead of relation names, the >> function is now half its length and probably more reliable. There's no >> need to touch pg_class directly at all. > > Sadly until we upgrade to EDB 9.0, I have to use my function. :) EDB 8.3 > (which is really PostgreSQL 8.2) doesn't have a regclass->text conversion. > But I'll bookmark the wiki page anyway, so I can update my function after > upgrading. Heh. > Given that many folks still run < 9.0 in production, the wiki page should really have a version of that function for older versions, whether it's long or not.
On 06/22/2011 01:12 PM, Scott Marlowe wrote: > Given that many folks still run< 9.0 in production, the wiki page > should really have a version of that function for older versions, > whether it's long or not. This version does work on anything 8.3 and above. I just lamented on 9.0 because we decided to skip 8.4 in favor of 9.0. And as we use EDB instead of PostgreSQL directly, our 8.3 is actually 8.2. Got that? ;) Sorry for the confusion. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
On 06/22/2011 02:12 PM, Scott Marlowe wrote: > Given that many folks still run < 9.0 in production, the wiki page > should really have a version of that function for older versions, > whether it's long or not. > I updated the page already to be clear about what versions of PostgreSQL it works on, and it directs people to Shaun's original message if they are running 8.2. The only people who might get confused now are the ones running EDB's versions, where the exact features you get in particular versions can be slightly different than the community version. But that problem both exists in other parts of the wiki, and is a bit outside of its scope to try and address. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books