Обсуждение: Efficiency of timestamps
As I keep looking through code to see where I can make things more efficient, I noticed that in some cases timestamps seem horribly inefficient. This leads to very long run times for certain queries. Here is an example: -- USING TIMESTAMPS TO NARROW DOWN -- SELECT Post.PostIDNumber, Post.PuppeteerLogin, Post.PuppetName, Post.PostCmd, Post.PostClass FROM ethereal.Post WHERE Post.PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 Minutes') AND Post.RealmName='Amalgam' AND (Post.PostTo='all' OR Post.PostTo='root') AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin FROM ethereal.PuppetIgnore WHERE PuppetIgnore.PuppetIgnore='global' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin) OR Post.PuppeteerLogin IS NULL) AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName FROM ethereal.PuppetIgnore WHERE PuppetIgnore='single' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetName=Post.PuppetName) OR Post.PuppetName IS NULL) ORDER BY Post.PostIDNumber LIMIT 100 -- Explain of Above-- Limit (cost=0.00..260237.32 rows=100 width=48) -> Index Scan using pkpost on post (cost=0.00..3020594.00 rows=1161 width=48) Filter: ((posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character varying) AND ((postto = 'all'::character varying) OR (postto = 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) SubPlan -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..13.31 rows=1 width=10) Index Cond: (puppeteerlogin = 'root'::character varying) Filter: ((puppetignore = 'global'::character varying) AND (puppetlogin = $0)) -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..5.84 rows=1 width=15) Index Cond: ((puppeteerlogin = 'root'::character varying) AND (puppetname = $1)) Filter: (puppetignore = 'single'::character varying) Result : 22 rows fetched (17.21 sec) -- USING A GENERATED ID NUMBER -- SELECT Post.PostIDNumber, Post.PuppeteerLogin, Post.PuppetName, Post.PostCmd, Post.PostClass FROM ethereal.Post WHERE Post.PostIDNumber > 1 AND Post.RealmName='Amalgam' AND (Post.PostTo='all' OR Post.PostTo='root') AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin FROM ethereal.PuppetIgnore WHERE PuppetIgnore.PuppetIgnore='global' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin) OR Post.PuppeteerLogin IS NULL) AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName FROM ethereal.PuppetIgnore WHERE PuppetIgnore='single' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetName=Post.PuppetName) OR Post.PuppetName IS NULL) ORDER BY Post.PostIDNumber LIMIT 100 -- Explain of Above-- Limit (cost=0.00..86712.10 rows=100 width=48) -> Index Scan using pkpost on post (cost=0.00..3019119.56 rows=3482 width=48) Index Cond: (postidnumber > 1) Filter: ((realmname = 'Amalgam'::character varying) AND ((postto = 'all'::character varying) OR (postto = 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) SubPlan -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..13.31 rows=1 width=10) Index Cond: (puppeteerlogin = 'root'::character varying) Filter: ((puppetignore = 'global'::character varying) AND (puppetlogin = $0)) -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..5.84 rows=1 width=15) Index Cond: ((puppeteerlogin = 'root'::character varying) AND (puppetname = $1)) Filter: (puppetignore = 'single'::character varying) Result : 100 rows fetched ( 0.19 sec) -- USING A MIXTURE OF BOTH -- SELECT Post.PostIDNumber, Post.PuppeteerLogin, Post.PuppetName, Post.PostCmd, Post.PostClass FROM ethereal.Post WHERE Post.PostIDNumber > (SELECT MIN(PostIDNumber) FROM ethereal.Post WHERE Post.PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 minutes'))::INT AND Post.RealmName='Amalgam' AND (Post.PostTo='all' OR Post.PostTo='root') AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin FROM ethereal.PuppetIgnore WHERE PuppetIgnore.PuppetIgnore='global' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin) OR Post.PuppeteerLogin IS NULL) AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName FROM ethereal.PuppetIgnore WHERE PuppetIgnore='single' AND PuppetIgnore.PuppeteerLogin='root' AND PuppetIgnore.PuppetName=Post.PuppetName) OR Post.PuppetName IS NULL) ORDER BY Post.PostIDNumber LIMIT 100 -- Explain of Above-- Limit (cost=0.00..87101.38 rows=100 width=48) InitPlan -> Aggregate (cost=12412.82..12412.82 rows=1 width=4) -> Index Scan using idxpost_timestamp on post (cost=0.00..12282.42 rows=52160 width=4) Index Cond: (posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval)) -> Index Scan using pkpost on post (cost=0.00..1010992.25 rows=1161 width=48) Index Cond: (postidnumber > $0) Filter: ((realmname = 'Amalgam'::character varying) AND ((postto = 'all'::character varying) OR (postto = 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) SubPlan -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..13.31 rows=1 width=10) Index Cond: (puppeteerlogin = 'root'::character varying) Filter: ((puppetignore = 'global'::character varying) AND (puppetlogin = $1)) -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..5.84 rows=1 width=15) Index Cond: ((puppeteerlogin = 'root'::character varying) AND (puppetname = $2)) Filter: (puppetignore = 'single'::character varying) Result : 18 rows fetched ( 0.04 sec) Both PostIDNumber and PostTimestamp are indexed, so that should not be a bottleneck in itself. However, as you can see in the third example the use of a sub-query actually accelerates the process considerably, meaning that integer based searching is much much faster. Under MySQL timestamps where in Unix time, which is why I may have never noticed such an extreme slowdown when doing similar on that script. Of course to boggle the mind, here is a view that works very well: CREATE VIEW ethereal.Who AS SELECT Po.PuppetName AS PuppetName, Po.PuppeteerLogin AS PuppeteerLogin, Po.RealmName AS RealmName, Re.RealmPublic AS RealmPublic, Re.RealmVerified AS RealmVerified FROM ethereal.Post Po, ethereal.Puppet Ch, ethereal.Realm Re WHERE Po.PuppeteerLogin = Ch.PuppeteerLogin AND Po.RealmName = Re.RealmName AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes') AND Po.PuppetName IS NOT NULL GROUP BY Po.PuppeteerLogin, Po.PuppetName, Po.RealmName, Re.RealmPublic, Re.RealmVerified ORDER BY Po.RealmName, Po.PuppetName; Sort (cost=309259.89..309629.34 rows=147780 width=79) Sort Key: po.realmname, po.puppetname -> Group (cost=270648.27..292815.19 rows=147780 width=79) -> Sort (cost=270648.27..274342.75 rows=1477795 width=79) Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, re.realmpublic, re.realmverified -> Merge Join (cost=22181.60..41087.65 rows=1477795 width=79) Merge Cond: ("outer".puppeteerlogin = "inner".puppeteerlogin) -> Sort (cost=17172.82..17300.26 rows=50978 width=69) Sort Key: po.puppeteerlogin -> Hash Join (cost=12.41..13186.95 rows=50978 width=69) Hash Cond: ("outer".realmname = "inner".realmname) -> Index Scan using idxpost_timestamp on post po (cost=0.00..12282.42 rows=50978 width=42) Index Cond: (posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval)) Filter: (puppetname IS NOT NULL) -> Hash (cost=11.93..11.93 rows=193 width=27) -> Seq Scan on realm re (cost=0.00..11.93 rows=193 width=27) -> Sort (cost=5008.78..5100.22 rows=36574 width=10) Sort Key: ch.puppeteerlogin -> Seq Scan on puppet ch (cost=0.00..2236.74 rows=36574 width=10) Result : 48 rows fetched ( 0.55 sec) It uses the exact same time restraint as the first three examples, looks through the same table, does a tipple join and still gets off at higher speeds. This seems to indicate that timestamps are actually efficient, which contradicts above examples. Any ideas? Code for the table creation is below signature: Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org -- -- -- NAME : Post -- REFERENCES : Realm* -- Puppet* -- PuppeteerLogin* -- -- DESCRIPTION : Post is the hive of activity for all realms. Associated with all three -- major tables, it is not actually linked because of the nature of storing -- posts for statistics and auditing. CREATE TABLE ethereal.Post ( PostIDNumber INT NOT NULL DEFAULT NEXTVAL('ethereal.seqPost'), RealmName VARCHAR(30) NOT NULL, PuppetName VARCHAR(30), PuppeteerLogin VARCHAR(10), PostTo VARCHAR(30), PostTimestamp TIMESTAMP NOT NULL DEFAULT LOCALTIMESTAMP, PostClass VARCHAR(10) NOT NULL DEFAULT 'general', PostCmd VARCHAR(10) NOT NULL DEFAULT 'none', PostFullFormat TEXT, PostImagelessFormat TEXT, PostPartialFormat TEXT, CONSTRAINT pkPost PRIMARY KEY (PostIDNumber), CONSTRAINT enumPostClass CHECK (PostCLass IN ('banner','dice','duplicate','general','play','private','special','system')), CONSTRAINT enumPostCmd CHECK (PostCmd IN ('general','none','play','stream')) ) WITHOUT OIDS; -- STANDARD INDEX CREATE INDEX idxPost_Class ON ethereal.Post ( PostClass ); CREATE INDEX idxPost_Login ON ethereal.Post ( PuppeteerLogin ); CREATE INDEX idxPost_Puppet ON ethereal.Post ( PuppetName ); CREATE INDEX idxPost_Realm ON ethereal.Post ( RealmName ); CREATE INDEX idxPost_Timestamp ON ethereal.Post ( PostTimestamp );
On Tue, 8 Jul 2003, Martin Foster wrote: > As I keep looking through code to see where I can make things more > efficient, I noticed that in some cases timestamps seem horribly > inefficient. This leads to very long run times for certain queries. > > Here is an example: > > -- USING TIMESTAMPS TO NARROW DOWN -- > > SELECT > Post.PostIDNumber, > Post.PuppeteerLogin, > Post.PuppetName, > Post.PostCmd, > Post.PostClass > FROM ethereal.Post > WHERE Post.PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 Minutes') > AND Post.RealmName='Amalgam' > AND (Post.PostTo='all' OR Post.PostTo='root') > AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin > FROM ethereal.PuppetIgnore > WHERE PuppetIgnore.PuppetIgnore='global' > AND PuppetIgnore.PuppeteerLogin='root' > AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin) > OR Post.PuppeteerLogin IS NULL) > AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName > FROM ethereal.PuppetIgnore > WHERE PuppetIgnore='single' > AND PuppetIgnore.PuppeteerLogin='root' > AND PuppetIgnore.PuppetName=Post.PuppetName) > OR Post.PuppetName IS NULL) > ORDER BY Post.PostIDNumber LIMIT 100 > > -- Explain of Above-- > Limit (cost=0.00..260237.32 rows=100 width=48) > -> Index Scan using pkpost on post (cost=0.00..3020594.00 rows=1161 > width=48) > Filter: ((posttimestamp > (('now'::text)::timestamp(6) without > time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character > varying) AND ((postto = 'all'::character varying) OR (postto = > 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS > NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) I think you might get better results with some kind of multi-column index. It's using the index to avoid a sort it looks like, but it's not helping to find the conditions. I can't remember the correct ordering, but maybe (posttimestamp, realmname, postidnumber). Having separate indexes on the fields won't help currently since only one index will get chosen for the scan. Also, what does explain analyze show? > -- NAME : Post > -- REFERENCES : Realm* > -- Puppet* > -- PuppeteerLogin* > -- > -- DESCRIPTION : Post is the hive of activity for all realms. > Associated with all three > -- major tables, it is not actually linked because of the > nature of storing > -- posts for statistics and auditing. > > CREATE TABLE ethereal.Post ( > PostIDNumber INT NOT NULL DEFAULT > NEXTVAL('ethereal.seqPost'), > RealmName VARCHAR(30) NOT NULL, > PuppetName VARCHAR(30), > PuppeteerLogin VARCHAR(10), > PostTo VARCHAR(30), > PostTimestamp TIMESTAMP NOT NULL DEFAULT LOCALTIMESTAMP, > PostClass VARCHAR(10) NOT NULL DEFAULT 'general', > PostCmd VARCHAR(10) NOT NULL DEFAULT 'none', > PostFullFormat TEXT, > PostImagelessFormat TEXT, > PostPartialFormat TEXT, > CONSTRAINT pkPost PRIMARY KEY (PostIDNumber), > CONSTRAINT enumPostClass CHECK (PostCLass IN > ('banner','dice','duplicate','general','play','private','special','system')), > CONSTRAINT enumPostCmd CHECK (PostCmd IN > ('general','none','play','stream')) > ) WITHOUT OIDS; > > -- STANDARD INDEX > CREATE INDEX idxPost_Class ON ethereal.Post > ( > PostClass > ); > > CREATE INDEX idxPost_Login ON ethereal.Post > ( > PuppeteerLogin > ); > > CREATE INDEX idxPost_Puppet ON ethereal.Post > ( > PuppetName > ); > > CREATE INDEX idxPost_Realm ON ethereal.Post > ( > RealmName > ); > > CREATE INDEX idxPost_Timestamp ON ethereal.Post > ( > PostTimestamp > ); > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Stephan Szabo wrote: > > > I think you might get better results with some kind of multi-column index. > It's using the index to avoid a sort it looks like, but it's not helping > to find the conditions. I can't remember the correct ordering, but maybe > (posttimestamp, realmname, postidnumber). Having separate indexes on the > fields won't help currently since only one index will get chosen for the > scan. Also, what does explain analyze show? > Hope that shed's light on the matter. Limit (cost=0.00..260237.32 rows=100 width=48) (actual time=68810.26..68820.83 rows=55 loops=1) -> Index Scan using pkpost on post (cost=0.00..3020594.00 rows=1161 width=48) (actual time=68810.25..68820.72 rows=55 loops=1) Filter: ((posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character varying) AND ((postto = 'all'::character varying) OR (postto = 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) SubPlan -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..13.31 rows=1 width=10) (actual time=0.02..0.02 rows=0 loops=55) Index Cond: (puppeteerlogin = 'root'::character varying) Filter: ((puppetignore = 'global'::character varying) AND (puppetlogin = $0)) -> Index Scan using pkpuppetignore on puppetignore (cost=0.00..5.84 rows=1 width=15) (actual time=0.01..0.01 rows=0 loops=55) Index Cond: ((puppeteerlogin = 'root'::character varying) AND (puppetname = $1)) Filter: (puppetignore = 'single'::character varying) Total runtime: 68821.11 msec -- Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
On Tue, 8 Jul 2003, Martin Foster wrote: > Stephan Szabo wrote: > > > > > > I think you might get better results with some kind of multi-column index. > > It's using the index to avoid a sort it looks like, but it's not helping > > to find the conditions. I can't remember the correct ordering, but maybe > > (posttimestamp, realmname, postidnumber). Having separate indexes on the > > fields won't help currently since only one index will get chosen for the > > scan. Also, what does explain analyze show? > > > > Hope that shed's light on the matter. > > Limit (cost=0.00..260237.32 rows=100 width=48) (actual > time=68810.26..68820.83 rows=55 loops=1) > -> Index Scan using pkpost on post (cost=0.00..3020594.00 > rows=1161 width=48) (actual time=68810.25..68820.72 rows=55 loops=1) > Filter: ((posttimestamp > (('now'::text)::timestamp(6) without > time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character > varying) AND ((postto = 'all'::character varying) OR (postto = > 'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS > NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL))) > SubPlan > -> Index Scan using pkpuppetignore on puppetignore > (cost=0.00..13.31 rows=1 width=10) (actual time=0.02..0.02 rows=0 loops=55) > Index Cond: (puppeteerlogin = 'root'::character varying) > Filter: ((puppetignore = 'global'::character varying) > AND (puppetlogin = $0)) > -> Index Scan using pkpuppetignore on puppetignore > (cost=0.00..5.84 rows=1 width=15) (actual time=0.01..0.01 rows=0 loops=55) > Index Cond: ((puppeteerlogin = 'root'::character > varying) AND (puppetname = $1)) > Filter: (puppetignore = 'single'::character varying) > Total runtime: 68821.11 msec The row estimate is high. How many rows meet the various conditions and some of the combinations? And how many rows does it estimate if you do a simpler query on those with explain? I still think some variety of multi-column index to make the above index conditions would help, but you'd probably need to play with which ones help, and with the cost cut for the limit, I don't know if it'd actually get a better plan, but it may be worth trying a bunch and seeing which ones are useful and then dropping the rest.
Stephan Szabo wrote: > > > The row estimate is high. How many rows meet the various conditions and > some of the combinations? And how many rows does it estimate if you do a > simpler query on those with explain? > > I still think some variety of multi-column index to make the above index > conditions would help, but you'd probably need to play with which ones > help, and with the cost cut for the limit, I don't know if it'd actually > get a better plan, but it may be worth trying a bunch and seeing which > ones are useful and then dropping the rest. > > At any given point in time you would not expect to see much more then 30 posts applying for a time based search. That is primarily a result of having more then one room for which posts are attached to, and then some posts exist just to show people are there et cetera. Simpler queries seem to do quiet well. That view makes use of the same table and seems to have no performance impact from doing as such, and the position based search is considerably faster. I can show EXPLAIN ANALYSE for all of those if you wish. Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
On Tue, 8 Jul 2003, Martin Foster wrote: > Stephan Szabo wrote: > > > The row estimate is high. How many rows meet the various conditions and > > some of the combinations? And how many rows does it estimate if you do a > > simpler query on those with explain? > > > > I still think some variety of multi-column index to make the above index > > conditions would help, but you'd probably need to play with which ones > > help, and with the cost cut for the limit, I don't know if it'd actually > > get a better plan, but it may be worth trying a bunch and seeing which > > ones are useful and then dropping the rest. > > > At any given point in time you would not expect to see much more then 30 > posts applying for a time based search. That is primarily a result of > having more then one room for which posts are attached to, and then some > posts exist just to show people are there et cetera. > > Simpler queries seem to do quiet well. That view makes use of the same > table and seems to have no performance impact from doing as such, and > the position based search is considerably faster. Well, the reason I asked is to see both whether the estimates for the various columns were somewhere near reality (if not, then you may need to raise the statistics target for the column) which might affect whether it'd consider using a multi-column index for the conditions and sort rather than the index scan it was using.
Stephan Szabo wrote: > > Well, the reason I asked is to see both whether the estimates for the > various columns were somewhere near reality (if not, then you may need to > raise the statistics target for the column) which might affect whether > it'd consider using a multi-column index for the conditions and sort > rather than the index scan it was using. > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster I'm going to have to pull out the 'Practical PostgreSQL' book and brush up on optimizing. This level of optimization is not something I have had to deal with in the past. Also to make this interesting. The sub-query method is faster at times and slower in others. But doing two separate queries and working on the PostIDNumber field exclusively is always blazingly fast... Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org