Efficiency of timestamps

Поиск
Список
Период
Сортировка
От Martin Foster
Тема Efficiency of timestamps
Дата
Msg-id 3F0B6178.2020805@ethereal-realms.org
обсуждение исходный текст
Ответы Re: Efficiency of timestamps  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Список pgsql-performance
As I keep looking through code to see where I can make things more
efficient, I noticed that in some cases timestamps seem horribly
inefficient.   This leads to very long run times for certain queries.

Here is an example:

-- USING TIMESTAMPS TO NARROW DOWN --

SELECT
  Post.PostIDNumber,
  Post.PuppeteerLogin,
  Post.PuppetName,
  Post.PostCmd,
  Post.PostClass
  FROM ethereal.Post
  WHERE Post.PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 Minutes')
  AND Post.RealmName='Amalgam'
  AND (Post.PostTo='all' OR Post.PostTo='root')
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore.PuppetIgnore='global'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin)
  OR Post.PuppeteerLogin IS NULL)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore='single'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetName=Post.PuppetName)
  OR Post.PuppetName IS NULL)
  ORDER BY Post.PostIDNumber LIMIT 100

-- Explain of Above--
Limit  (cost=0.00..260237.32 rows=100 width=48)
   ->  Index Scan using pkpost on post  (cost=0.00..3020594.00 rows=1161
width=48)
         Filter: ((posttimestamp > (('now'::text)::timestamp(6) without
time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character
varying) AND ((postto = 'all'::character varying) OR (postto =
'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS
NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL)))
         SubPlan
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..13.31 rows=1 width=10)
                 Index Cond: (puppeteerlogin = 'root'::character varying)
                 Filter: ((puppetignore = 'global'::character varying)
AND (puppetlogin = $0))
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..5.84 rows=1 width=15)
                 Index Cond: ((puppeteerlogin = 'root'::character
varying) AND (puppetname = $1))
                 Filter: (puppetignore = 'single'::character varying)


Result : 22 rows fetched (17.21 sec)


-- USING A GENERATED ID NUMBER --

SELECT
  Post.PostIDNumber,
  Post.PuppeteerLogin,
  Post.PuppetName,
  Post.PostCmd,
  Post.PostClass
  FROM ethereal.Post
  WHERE Post.PostIDNumber > 1
  AND Post.RealmName='Amalgam'
  AND (Post.PostTo='all' OR Post.PostTo='root')
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore.PuppetIgnore='global'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin)
  OR Post.PuppeteerLogin IS NULL)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore='single'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetName=Post.PuppetName)
  OR Post.PuppetName IS NULL)
  ORDER BY Post.PostIDNumber LIMIT 100

-- Explain of Above--
Limit  (cost=0.00..86712.10 rows=100 width=48)
   ->  Index Scan using pkpost on post  (cost=0.00..3019119.56 rows=3482
width=48)
         Index Cond: (postidnumber > 1)
         Filter: ((realmname = 'Amalgam'::character varying) AND
((postto = 'all'::character varying) OR (postto = 'root'::character
varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT
(subplan)) OR (puppetname IS NULL)))
         SubPlan
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..13.31 rows=1 width=10)
                 Index Cond: (puppeteerlogin = 'root'::character varying)
                 Filter: ((puppetignore = 'global'::character varying)
AND (puppetlogin = $0))
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..5.84 rows=1 width=15)
                 Index Cond: ((puppeteerlogin = 'root'::character
varying) AND (puppetname = $1))
                 Filter: (puppetignore = 'single'::character varying)


Result : 100 rows fetched ( 0.19 sec)


-- USING A MIXTURE OF BOTH --

SELECT
  Post.PostIDNumber,
  Post.PuppeteerLogin,
  Post.PuppetName,
  Post.PostCmd,
  Post.PostClass
  FROM ethereal.Post
  WHERE Post.PostIDNumber > (SELECT MIN(PostIDNumber)
  FROM ethereal.Post
  WHERE Post.PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 minutes'))::INT
  AND Post.RealmName='Amalgam'
  AND (Post.PostTo='all' OR Post.PostTo='root')
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore.PuppetIgnore='global'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin)
  OR Post.PuppeteerLogin IS NULL)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName
  FROM ethereal.PuppetIgnore
  WHERE PuppetIgnore='single'
  AND PuppetIgnore.PuppeteerLogin='root'
  AND PuppetIgnore.PuppetName=Post.PuppetName)
  OR Post.PuppetName IS NULL)
  ORDER BY Post.PostIDNumber LIMIT 100

-- Explain of Above--
Limit  (cost=0.00..87101.38 rows=100 width=48)
   InitPlan
     ->  Aggregate  (cost=12412.82..12412.82 rows=1 width=4)
           ->  Index Scan using idxpost_timestamp on post
(cost=0.00..12282.42 rows=52160 width=4)
                 Index Cond: (posttimestamp >
(('now'::text)::timestamp(6) without time zone - '00:10'::interval))
   ->  Index Scan using pkpost on post  (cost=0.00..1010992.25 rows=1161
width=48)
         Index Cond: (postidnumber > $0)
         Filter: ((realmname = 'Amalgam'::character varying) AND
((postto = 'all'::character varying) OR (postto = 'root'::character
varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT
(subplan)) OR (puppetname IS NULL)))
         SubPlan
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..13.31 rows=1 width=10)
                 Index Cond: (puppeteerlogin = 'root'::character varying)
                 Filter: ((puppetignore = 'global'::character varying)
AND (puppetlogin = $1))
           ->  Index Scan using pkpuppetignore on puppetignore
(cost=0.00..5.84 rows=1 width=15)
                 Index Cond: ((puppeteerlogin = 'root'::character
varying) AND (puppetname = $2))
                 Filter: (puppetignore = 'single'::character varying)


Result : 18 rows fetched ( 0.04 sec)


Both PostIDNumber and PostTimestamp are indexed, so that should not be a
bottleneck in itself.    However, as you can see in the third example
the use of a sub-query actually accelerates the process considerably,
meaning that integer based searching is much much faster.

Under MySQL timestamps where in Unix time, which is why I may have never
noticed such an extreme slowdown when doing similar on that script.   Of
course to boggle the mind, here is a view that works very well:

CREATE VIEW ethereal.Who AS
   SELECT
     Po.PuppetName     AS PuppetName,
     Po.PuppeteerLogin AS PuppeteerLogin,
     Po.RealmName      AS RealmName,
     Re.RealmPublic    AS RealmPublic,
     Re.RealmVerified  AS RealmVerified
   FROM ethereal.Post Po, ethereal.Puppet Ch, ethereal.Realm Re
   WHERE Po.PuppeteerLogin = Ch.PuppeteerLogin
     AND Po.RealmName = Re.RealmName
     AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
     AND Po.PuppetName IS NOT NULL
   GROUP BY Po.PuppeteerLogin, Po.PuppetName, Po.RealmName,
Re.RealmPublic, Re.RealmVerified
   ORDER BY Po.RealmName, Po.PuppetName;

Sort  (cost=309259.89..309629.34 rows=147780 width=79)
   Sort Key: po.realmname, po.puppetname
   ->  Group  (cost=270648.27..292815.19 rows=147780 width=79)
         ->  Sort  (cost=270648.27..274342.75 rows=1477795 width=79)
               Sort Key: po.puppeteerlogin, po.puppetname, po.realmname,
re.realmpublic, re.realmverified
               ->  Merge Join  (cost=22181.60..41087.65 rows=1477795
width=79)
                     Merge Cond: ("outer".puppeteerlogin =
"inner".puppeteerlogin)
                     ->  Sort  (cost=17172.82..17300.26 rows=50978 width=69)
                           Sort Key: po.puppeteerlogin
                           ->  Hash Join  (cost=12.41..13186.95
rows=50978 width=69)
                                 Hash Cond: ("outer".realmname =
"inner".realmname)
                                 ->  Index Scan using idxpost_timestamp
on post po  (cost=0.00..12282.42 rows=50978 width=42)
                                       Index Cond: (posttimestamp >
(('now'::text)::timestamp(6) without time zone - '00:10'::interval))
                                       Filter: (puppetname IS NOT NULL)
                                 ->  Hash  (cost=11.93..11.93 rows=193
width=27)
                                       ->  Seq Scan on realm re
(cost=0.00..11.93 rows=193 width=27)
                     ->  Sort  (cost=5008.78..5100.22 rows=36574 width=10)
                           Sort Key: ch.puppeteerlogin
                           ->  Seq Scan on puppet ch
(cost=0.00..2236.74 rows=36574 width=10)


Result : 48 rows fetched ( 0.55 sec)


It uses the exact same time restraint as the first three examples, looks
through the same table, does a tipple join and still gets off at higher
speeds.   This seems to indicate that timestamps are actually efficient,
which contradicts above examples.

Any ideas?   Code for the table creation is below signature:

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org


--
--
-- NAME        : Post
-- REFERENCES  : Realm*
--               Puppet*
--               PuppeteerLogin*
--
-- DESCRIPTION : Post is the hive of activity for all realms.
Associated with all three
--               major tables, it is not actually linked because of the
nature of storing
--               posts for statistics and auditing.

CREATE TABLE ethereal.Post (
   PostIDNumber         INT           NOT NULL DEFAULT
NEXTVAL('ethereal.seqPost'),
   RealmName            VARCHAR(30)   NOT NULL,
   PuppetName           VARCHAR(30),
   PuppeteerLogin       VARCHAR(10),
   PostTo               VARCHAR(30),
   PostTimestamp        TIMESTAMP     NOT NULL DEFAULT LOCALTIMESTAMP,
   PostClass            VARCHAR(10)   NOT NULL DEFAULT 'general',
   PostCmd              VARCHAR(10)   NOT NULL DEFAULT 'none',
   PostFullFormat       TEXT,
   PostImagelessFormat  TEXT,
   PostPartialFormat    TEXT,
   CONSTRAINT pkPost                  PRIMARY KEY (PostIDNumber),
   CONSTRAINT enumPostClass           CHECK (PostCLass IN
('banner','dice','duplicate','general','play','private','special','system')),
   CONSTRAINT enumPostCmd             CHECK (PostCmd IN
('general','none','play','stream'))
) WITHOUT OIDS;

-- STANDARD INDEX
CREATE INDEX idxPost_Class ON ethereal.Post
(
   PostClass
);

CREATE INDEX idxPost_Login ON ethereal.Post
(
   PuppeteerLogin
);

CREATE INDEX idxPost_Puppet ON ethereal.Post
(
   PuppetName
);

CREATE INDEX idxPost_Realm ON ethereal.Post
(
   RealmName
);

CREATE INDEX idxPost_Timestamp ON ethereal.Post
(
   PostTimestamp
);



В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: Strange result: UNIX vs. TCP/IP sockets
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: Efficiency of timestamps