Обсуждение: Interesting query plan change linked to the LIMIT parameter
Greetings,
I'm experiencing a strange query plan change on a "simple" request, based on the LIMIT parameter. I present here two tables, named _article and _comment. they are part of a much larger database.
It's a tree-like database (making use of the ltree to keep the coherence), and in our case, an article can have any number of comments (linked by _article.id = _comment.parent_id), and a _comment can itself have ONE comment (at most).
The _article table contains 12 millions t-uples, and the _comment table around 26 millions. The server runs a postgresql 8.3.5 in its 64bits version.
Here are the tables definition :
Now I created a function to get simply the comment reply to a given comment :
Ok, now, all is set. I'd like to get with a simple query the comments of a given article, ordered by publishing date, as well as their replies if they exists. So I write this request :
The second query scans the whole comment table which is very dangerous for production servers.
So did I do something wrong ? Is there a way to handle this issue smoothly ?
Thanks in advance
Yannick
I'm experiencing a strange query plan change on a "simple" request, based on the LIMIT parameter. I present here two tables, named _article and _comment. they are part of a much larger database.
It's a tree-like database (making use of the ltree to keep the coherence), and in our case, an article can have any number of comments (linked by _article.id = _comment.parent_id), and a _comment can itself have ONE comment (at most).
The _article table contains 12 millions t-uples, and the _comment table around 26 millions. The server runs a postgresql 8.3.5 in its 64bits version.
Here are the tables definition :
-- _article table
CREATE TABLE "ob2"."_article" (
"id" bigint NOT NULL DEFAULT nextval('element_id_sequence'::regclass),
"parent_id" bigint,
"path" ltree,
"data" text,
"date_creation" timestamp without time zone NOT NULL DEFAULT now(),
"date_publishing" timestamp without time zone NOT NULL DEFAULT now(),
"date_modification" timestamp without time zone NOT NULL DEFAULT now(),
"counters" hstore,
"reference" integer NOT NULL DEFAULT nextval('_article_reference_seq'::regclass),
"title" character varying NOT NULL,
"text" text,
"blog_id" bigint,
"user_id" bigint,
"site_id" bigint,
"topic_id" bigint,
"community_id" bigint,
CONSTRAINT "_article_pkey" PRIMARY KEY (id)
) WITHOUT OIDS;
ALTER TABLE ONLY "ob2"."_article" ALTER COLUMN "path" SET STORAGE PLAIN;
ALTER TABLE ONLY "ob2"."_article" ALTER COLUMN "title" SET STORAGE PLAIN;
-- Indexes
CREATE UNIQUE INDEX _article_pkey ON _article USING btree (id);
CREATE INDEX gist_idx_article_path ON _article USING gist (path);
CREATE INDEX idx_article_blog_id ON _article USING btree (blog_id);
CREATE INDEX idx_article_community_id ON _article USING btree (community_id);
CREATE INDEX idx_article_date_creation ON _article USING btree (date_creation);
CREATE INDEX idx_article_date_modification ON _article USING btree (date_modification);
CREATE INDEX idx_article_date_publishing ON _article USING btree (date_publishing);
CREATE INDEX idx_article_parent_id ON _article USING btree (parent_id);
CREATE UNIQUE INDEX idx_article_reference_unique ON _article USING btree (reference);
CREATE INDEX idx_article_site_id ON _article USING btree (site_id);
CREATE INDEX idx_article_topic_id ON _article USING btree (topic_id);
CREATE INDEX idx_article_user_id ON _article USING btree (user_id);
-- _comment table
CREATE TABLE "ob2"."_comment" (
"id" bigint NOT NULL DEFAULT nextval('element_id_sequence'::regclass),
"parent_id" bigint,
"path" ltree,
"data" text,
"date_creation" timestamp without time zone NOT NULL DEFAULT now(),
"date_publishing" timestamp without time zone NOT NULL DEFAULT now(),
"date_modification" timestamp without time zone NOT NULL DEFAULT now(),
"counters" hstore,
"reference" integer NOT NULL DEFAULT nextval('_comment_reference_seq'::regclass),
"text" text,
"article_id" bigint,
"blog_id" bigint,
"user_id" bigint,
"site_id" bigint,
CONSTRAINT "_comment_pkey" PRIMARY KEY (id)
) WITHOUT OIDS;
ALTER TABLE ONLY "ob2"."_comment" ALTER COLUMN "path" SET STORAGE PLAIN;
-- Indexes
CREATE UNIQUE INDEX _comment_pkey ON _comment USING btree (id);
CREATE INDEX gist_idx_comment_path ON _comment USING gist (path);
CREATE INDEX idx_comment_date_creation ON _comment USING btree (date_creation);
CREATE INDEX idx_comment_date_publishing ON _comment USING btree (date_publishing);
CREATE INDEX idx_comment_parent_id ON _comment USING btree (parent_id);
CREATE INDEX idx_comment_reference ON _comment USING btree (reference);
Now I created a function to get simply the comment reply to a given comment :
CREATE OR REPLACE FUNCTION get_comment_response (BIGINT) RETURNS _comment AS
$$
SELECT * FROM _comment WHERE parent_id = $1;
$$
STABLE
COST 1
LANGUAGE SQL;
Ok, now, all is set. I'd like to get with a simple query the comments of a given article, ordered by publishing date, as well as their replies if they exists. So I write this request :
SELECTResults are good, quite fast, BUT, when executing tests I discovered something very strange. The query was fast for 3+ comments, but very slow with a limit of 1 or 2 ! Just because the query plan change :
_comment.id,
(get_comment_response(_comment.id)).id AS r_id
FROM _comment
INNER JOIN _article
ON _article.id = _comment.parent_id
WHERE _comment.parent_id = '17355952'
ORDER BY _comment.date_publishing ASC
OFFSET 0
LIMIT 10;
EXPLAIN
SELECT _comment.id,
(get_comment_response(_comment.id)).id AS r_id
FROM _comment
INNER JOIN _article
ON _article.id = _comment.parent_id
WHERE _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=10261.19..10263.69 rows=1000 width=8)
-> Sort (cost=10261.19..10281.06 rows=7949 width=8)
Sort Key: _comment.id
-> Nested Loop (cost=0.00..9825.35 rows=7949 width=8)
-> Index Scan using _article_pkey on _article (cost=0.00..9.55 rows=1 width=8)
Index Cond: (id = 17355952::bigint)
-> Index Scan using idx_comment_parent_id on _comment (cost=0.00..9716.44 rows=7949 width=16)
Index Cond: (_comment.parent_id = 17355952::bigint)
(8 rows)
EXPLAIN
SELECT _comment.id,
(get_comment_response(_comment.id)).id AS r_id
FROM _comment
INNER JOIN _article
ON _article.id = _comment.parent_id
WHERE _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Limit (cost=0.00..3588.42 rows=1 width=8)
-> Nested Loop (cost=0.00..28524312.40 rows=7949 width=8)
-> Index Scan using _comment_pkey on _comment (cost=0.00..28448324.73 rows=7949 width=16)
Filter: (parent_id = 17355952::bigint)
-> Index Scan using _article_pkey on _article (cost=0.00..9.55 rows=1 width=8)
Index Cond: (_article.id = 17355952::bigint)
(6 rows)
The second query scans the whole comment table which is very dangerous for production servers.
So did I do something wrong ? Is there a way to handle this issue smoothly ?
Thanks in advance
Yannick
On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart <yannick@over-blog.com> wrote: > > The second query scans the whole comment table which is very dangerous for > production servers. That's not quite true. The second does an index scan- the planner seems to be guessing that it'll fulfill the required limit early in the index scan; only with a pathologically bad case would it actually have to scan the entire thing. Basically, the second query is optimized to spit out the first few rows quickly, since that's all you asked for with the limit. Note that your first query has a final cost estimate of "Limit (cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated 10261.19 to emit the first row; the second has "Limit (cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically, instant) to emit the first - and only desired - row. That all said, an explain analyze would give us a better idea of what's going on- we can't tell if the planner is making bad estimates without the knowledge of what the real timing and row count results of plan stages were. -- - David T. Wilson david.t.wilson@gmail.com
Thanks for the rapid response.
I can understand the way the planner makes its guess, but as a matter of fact, he'll be nearly always wrong, just becausethe most commented articles have only around 5000 or so comments. I ran the explain analyze tonight and got this results :
As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our case, LIMIT 2).
Yannick.
I can understand the way the planner makes its guess, but as a matter of fact, he'll be nearly always wrong, just becausethe most commented articles have only around 5000 or so comments. I ran the explain analyze tonight and got this results :
EXPLAIN ANALYZE SELECT _comment.id,
(get_comment_response(_comment.id)).id AS r_id
FROM _comment
INNER JOIN _article
ON _article.id = _comment.parent_id
WHERE _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3588.42 rows=1 width=8) (actual time=498597.115..498597.116 rows=1 loops=1)
-> Nested Loop (cost=0.00..28524312.40 rows=7949 width=8) (actual time=498597.114..498597.114 rows=1 loops=1)
-> Index Scan using _comment_pkey on _comment (cost=0.00..28448324.73 rows=7949 width=16) (actual time=498473.360..498473.360 rows=1 loops=1)
Filter: (parent_id = 17355952::bigint)
-> Index Scan using _article_pkey on _article (cost=0.00..9.55 rows=1 width=8) (actual time=63.465..63.465 rows=1 loops=1)
Index Cond: (_article.id = 17355952::bigint)
Total runtime: 498615.230 ms
(7 rows)
EXPLAIN ANALYZE SELECT _comment.id,
(get_comment_response(_comment.id)).id AS r_id
FROM _comment
INNER JOIN _article
ON _article.id = _comment.parent_id
WHERE _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10261.19..10263.69 rows=1000 width=8) (actual time=127.037..127.267 rows=1000 loops=1)
-> Sort (cost=10261.19..10281.06 rows=7949 width=8) (actual time=127.036..127.128 rows=1000 loops=1)
Sort Key: _comment.id
Sort Method: top-N heapsort Memory: 95kB
-> Nested Loop (cost=0.00..9825.35 rows=7949 width=8) (actual time=0.472..122.986 rows=4674 loops=1)
-> Index Scan using _article_pkey on _article (cost=0.00..9.55 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=1)
Index Cond: (id = 17355952::bigint)
-> Index Scan using idx_comment_parent_id on _comment (cost=0.00..9716.44 rows=7949 width=16) (actual time=0.235..32.869 rows=4674 loops=1)
Index Cond: (_comment.parent_id = 17355952::bigint)
Total runtime: 127.410 ms
(10 rows)
As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our case, LIMIT 2).
Yannick.
2009/1/20 David Wilson <david.t.wilson@gmail.com>
On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le GuédartThat's not quite true. The second does an index scan- the planner
<yannick@over-blog.com> wrote:
>
> The second query scans the whole comment table which is very dangerous for
> production servers.
seems to be guessing that it'll fulfill the required limit early in
the index scan; only with a pathologically bad case would it actually
have to scan the entire thing. Basically, the second query is
optimized to spit out the first few rows quickly, since that's all you
asked for with the limit.
Note that your first query has a final cost estimate of "Limit
(cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated
10261.19 to emit the first row; the second has "Limit
(cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically,
instant) to emit the first - and only desired - row.
That all said, an explain analyze would give us a better idea of
what's going on- we can't tell if the planner is making bad estimates
without the knowledge of what the real timing and row count results of
plan stages were.
--
- David T. Wilson
david.t.wilson@gmail.com