Обсуждение: Searching union views not using indices
Hello everyone. We are facing a performance problem with views consisting of several unioned tables. The simplified schema is as follows: CREATE TABLE foo ( foo_object_id bigint, link_id bigint, somedata text, PRIMARY KEY (foo_object_id) ); CREATE TABLE bar ( bar_object_id bigint, link_id bigint, otherdata real, PRIMARY KEY (bar_object_id) ); There are actually five of such tables, all having two common attributes *_object_id and link_id. All tables have indices on link_id, which is very selective, close to unique. The *_object_id is unique within this scope across all tables, but that's not important. Then we have a view: CREATE VIEW commonview AS SELECT foo_object_id as object_id, link_id, 'It is in foo' as loc FROM foo UNION SELECT bar_object_id as object_id, link_id, 'It is in bar' as loc FROM bar We commonly do this: SELECT object_id FROM commonview WHERE link_id=1234567 The result is sequential scan on all tables, append, sort and then filter scan on this whole thing. Which of course is slow as hell. We use version 8.0.2. And now the question: Is there a way to force the planner to push the condition lower, so it will use the index? Or do you use some tricks in this scenario? Thanks for your suggestions. Bye. -- Michal Táborský CTO, Internet Mall, a.s. Internet Mall - obchody, které si oblíbíte <http://www.MALL.cz>
Michal Taborsky wrote: ... > UNION ... > The result is sequential scan on all tables, append, sort and then > filter scan on this whole thing. Which of course is slow as hell. We use > version 8.0.2. > > And now the question: Is there a way to force the planner to push the > condition lower, so it will use the index? Or do you use some tricks in > this scenario? Thanks for your suggestions. Try "UNION ALL", since UNION is defined as removing duplicates, which probably accounts for the sort. -- Richard Huxton Archonet Ltd
> Hello everyone. > > We are facing a performance problem with views consisting of several > unioned tables. The simplified schema is as follows: > > CREATE TABLE foo ( > foo_object_id bigint, > link_id bigint, > somedata text, > PRIMARY KEY (foo_object_id) ); point 1: well, you may want to consider: create table foobar ( prefix text, -- foo/bar/etc object_id bigint, link_id bigint, primary key(prefix, object_id) ); -- add indexes as appropriate and push foo/bar specific information to satellite table which refer back via pkey-key link. Now you get very quick and easy link id query and no view is necessary. You also may want to look at table inheritance but make sure you read all the disclaimers first. point 2: watch out for union, it is implied sort and duplicate filter. union all is faster although you may get duplicates. Merlin
On Fri, Nov 04, 2005 at 12:38:30PM +0100, Michal Taborsky wrote: > SELECT object_id FROM commonview WHERE link_id=1234567 > > The result is sequential scan on all tables, append, sort and then > filter scan on this whole thing. Which of course is slow as hell. We use > version 8.0.2. I couldn't duplicate this in 8.0.4; I don't know if anything's changed since 8.0.2 that would affect the query plan. Could you post the EXPLAIN ANALYZE output? It might also be useful to see the output with enable_seqscan disabled. Have the tables been vacuumed and analyzed recently? -- Michael Fuhr
Michal Taborsky <michal.taborsky@mall.cz> writes: > We are facing a performance problem with views consisting of several > unioned tables. The simplified schema is as follows: Perhaps you should show us the real schema, because I cannot duplicate your complaint on the toy case you show. regression=# explain SELECT object_id FROM commonview WHERE link_id=1234567; QUERY PLAN ------------------------------------------------------------------------------------------------ Subquery Scan commonview (cost=41.40..41.66 rows=13 width=8) -> Unique (cost=41.40..41.53 rows=13 width=16) -> Sort (cost=41.40..41.43 rows=13 width=16) Sort Key: object_id, link_id, loc -> Append (cost=0.00..41.16 rows=13 width=16) -> Subquery Scan "*SELECT* 1" (cost=0.00..17.12 rows=5 width=16) -> Index Scan using fooi on foo (cost=0.00..17.07 rows=5 width=16) Index Cond: (link_id = 1234567) -> Subquery Scan "*SELECT* 2" (cost=0.00..24.04 rows=8 width=16) -> Index Scan using bari on bar (cost=0.00..23.96 rows=8 width=16) Index Cond: (link_id = 1234567) (11 rows) (I had to add indexes on link_id to the example, of course.) As noted by others, you probably want to be using UNION ALL not UNION, but that's not the crux of the issue. regards, tom lane
Tom Lane napsal(a): > Michal Taborsky <michal.taborsky@mall.cz> writes: > >>We are facing a performance problem with views consisting of several >>unioned tables. The simplified schema is as follows: > > > Perhaps you should show us the real schema, because I cannot duplicate > your complaint on the toy case you show. > As noted by others, you probably want to be using UNION ALL not UNION, > but that's not the crux of the issue. OK. Mystery (sort of) solved. After you told me it works for you I had to assume the problem was somewhere else. And, indeed, it was, though it's not too obvious. The two attributes are actually not of tybe bigint, but of type "crm_object_id", which is created as follows: CREATE DOMAIN "public"."crm_object_id" AS bigint NULL; Everything started working perfectly after I modified the view like this: CREATE VIEW commonview AS SELECT foo_object_id::bigint as object_id, link_id::bigint, 'It is in foo' as loc FROM foo UNION SELECT bar_object_id::bigint as object_id, link_id::bigint, 'It is in bar' as loc FROM bar Not even modifying the select as this did not help: explain SELECT object_id FROM commonview WHERE link_id=1234567::crm_object_id; Is this a bug or feature? -- Michal Táborský CTO, Internet Mall, a.s. Internet Mall - obchody, které si oblíbíte <http://www.MALL.cz>
Michal Taborsky <michal.taborsky@mall.cz> writes: > OK. Mystery (sort of) solved. After you told me it works for you I had > to assume the problem was somewhere else. And, indeed, it was, though > it's not too obvious. > The two attributes are actually not of tybe bigint, but of type > "crm_object_id", which is created as follows: > CREATE DOMAIN "public"."crm_object_id" AS > bigint NULL; Ah. The problem is that the UNION's output column is bigint, and the type discrepancy (bigint above, domain below) discourages the planner from pushing down the WHERE condition. There's a related complaint here: http://archives.postgresql.org/pgsql-bugs/2005-10/msg00227.php If we were to change things so that the result of the UNION were still the domain, not plain bigint, then your example would be optimized the way you want. I'm unsure about what other side-effects that would have though. regards, tom lane