Обсуждение: WHERE condition not being pushed down to union parts

Поиск
Список
Период
Сортировка

WHERE condition not being pushed down to union parts

От
"John L. Clark"
Дата:
I have a database with two tables that relate similar data, and a view
which projects and combines the data from these two tables in order to
access them both in a consistent manner.  With enough information, the
application can specifically choose to query from one table or the
other, but in the more general case the data could come from either
table, so I need to query the view.  When I join against the view (or
an equivalent subselect), however, it looks like the joining condition
is not pushed down into the individual components of the union that
defines the view.  This leads to a significant performance degradation
when using the view; I ask the list for help in resolving this
problem.  The remainder of this email digs into this problem in
detail.

(If you were interested in background on this database, it implements
a backing store for a higher level RDF database, specifically for the
RDFLib project.  I would be happy to talk more about this application,
or the corresponding database design issues, with anyone who might be
interested, in whatever forum would be appropriate.)

I begin with the poorly performing query, which follows this
paragraph.  This query joins one of the tables to the view, and using
'explain' on this query gives the query plan listed below the query.
Note that in this query plan, the join filter happens after (above)
the collection of matching rows from each of the parts of the UNION.

<query>
select * from
  relations as component_0_statements
cross join
  URI_or_literal_object as component_1_statements
where
component_0_statements.predicate = -2875059751320018987 and
component_0_statements.object = -2827607394936393903 and
component_1_statements.subject = component_0_statements.subject and
component_1_statements.predicate = -2875059751320018987
</query>

<query-plan>

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=96.31..36201.57 rows=1 width=128)
  Join Filter: (component_0_statements.subject = literalproperties.subject)
  ->  Index Scan using relations_poscindex on relations
component_0_statements  (cost=0.00..9.96 rows=1 width=40)
        Index Cond: ((predicate = (-2875059751320018987)::bigint) AND
(object = (-2827607394936393903)::bigint))
  ->  Append  (cost=96.31..36044.62 rows=11759 width=88)
        ->  Bitmap Heap Scan on literalproperties
(cost=96.31..16190.72 rows=5052 width=49)
              Recheck Cond: (literalproperties.predicate =
(-2875059751320018987)::bigint)
              ->  Bitmap Index Scan on
literalproperties_predicateindex  (cost=0.00..95.04 rows=5052 width=0)
                    Index Cond: (literalproperties.predicate =
(-2875059751320018987)::bigint)
        ->  Bitmap Heap Scan on relations  (cost=128.99..19736.31
rows=6707 width=40)
              Recheck Cond: (relations.predicate =
(-2875059751320018987)::bigint)
              ->  Bitmap Index Scan on relations_predicateindex
(cost=0.00..127.32 rows=6707 width=0)
                    Index Cond: (relations.predicate =
(-2875059751320018987)::bigint)
(13 rows)
</query-plan>

As it turns out, all of the results are in fact from the 'relations'
table, so we get the same results if we query that table instead of
the more general view.  The corresponding query follows this
paragraph, and its query plan immediately follows it.  Note that in
this query plan, the join condition is pushed down to the leaf node as
an Index Condition, which seems to be the main source of the dramatic
performance difference.

<query>
select * from
  relations as component_0_statements
cross join
  relations as component_1_statements
where
component_0_statements.predicate = -2875059751320018987 and
component_0_statements.object = -2827607394936393903 and
component_1_statements.subject = component_0_statements.subject and
component_1_statements.predicate = -2875059751320018987
</query>

<query-plan>

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..26.11 rows=1 width=80)
  ->  Index Scan using relations_poscindex on relations
component_0_statements  (cost=0.00..9.96 rows=1 width=40)
        Index Cond: ((predicate = (-2875059751320018987)::bigint) AND
(object = (-2827607394936393903)::bigint))
  ->  Index Scan using relations_subjectindex on relations
component_1_statements  (cost=0.00..16.13 rows=1 width=40)
        Index Cond: (component_1_statements.subject =
component_0_statements.subject)
        Filter: (component_1_statements.predicate =
(-2875059751320018987)::bigint)
(6 rows)
</query-plan>

My research led me to a post by Tom Lane describing the conditions in
which the WHERE conditions cannot be pushed down to the UNION parts:
<http://archives.postgresql.org/pgsql-performance/2007-11/msg00041.php>.
I refactored the UNION definition slightly to attempt to bring all
the column types into alignment, as that seemed like it might be a
blocker, but the problem persists.  It didn't look like the other
conditions would hold in my case, but I certainly could be wrong.  For
reference, the definitions of the two tables and the view are listed
below.  The 'literalproperties' tables has 8229098 rows, and the
'relations' table has 6960820 rows.

# \d literalproperties
 Table "public.literalproperties"
    Column     |         Type         | Modifiers
----------------+----------------------+-----------
 subject        | bigint               | not null
 subject_term   | character(1)         | not null
 predicate      | bigint               | not null
 predicate_term | character(1)         | not null
 object         | bigint               | not null
 context        | bigint               | not null
 context_term   | character(1)         | not null
 data_type      | bigint               |
 language       | character varying(3) |
Indexes:
   "literalproperties_poscindex" UNIQUE, btree (predicate, object,
subject, context, data_type, language)
   "literalproperties_context_termindex" btree (context_term)
   "literalproperties_contextindex" btree (context)
   "literalproperties_data_typeindex" btree (data_type)
   "literalproperties_languageindex" btree (language)
   "literalproperties_objectindex" btree (object)
   "literalproperties_predicate_termindex" btree (predicate_term)
   "literalproperties_predicateindex" btree (predicate)
   "literalproperties_subject_termindex" btree (subject_term)
   "literalproperties_subjectindex" btree (subject)

# \d relations;
 Table "public.relations"
    Column     |     Type     | Modifiers
----------------+--------------+-----------
 subject        | bigint       | not null
 subject_term   | character(1) | not null
 predicate      | bigint       | not null
 predicate_term | character(1) | not null
 object         | bigint       | not null
 object_term    | character(1) | not null
 context        | bigint       | not null
 context_term   | character(1) | not null
Indexes:
   "relations_poscindex" UNIQUE, btree (predicate, object, subject, context)
   "relations_context_termindex" btree (context_term)
   "relations_contextindex" btree (context)
   "relations_object_termindex" btree (object_term)
   "relations_objectindex" btree (object)
   "relations_predicate_termindex" btree (predicate_term)
   "relations_predicateindex" btree (predicate)
   "relations_subject_termindex" btree (subject_term)
   "relations_subjectindex" btree (subject)

# \d uri_or_literal_object
 View "public.uri_or_literal_object"
    Column     |         Type         | Modifiers
----------------+----------------------+-----------
 subject        | bigint               |
 subject_term   | character(1)         |
 predicate      | bigint               |
 predicate_term | character(1)         |
 object         | bigint               |
 object_term    | character(1)         |
 context        | bigint               |
 context_term   | character(1)         |
 data_type      | bigint               |
 language       | character varying(3) |
View definition:
 SELECT literalproperties.subject, literalproperties.subject_term,
literalproperties.predicate, literalproperties.predicate_term,
literalproperties.object, 'L'::character(1) AS object_term,
literalproperties.context, literalproperties.context_term,
literalproperties.data_type, literalproperties.language
  FROM literalproperties
UNION ALL
 SELECT relations.subject, relations.subject_term,
relations.predicate, relations.predicate_term, relations.object,
relations.object_term, relations.context, relations.context_term,
NULL::bigint AS data_type, NULL::character varying(3) AS language
  FROM relations;

Does anyone have any ideas about how I could better optimize joins
against a union (either with a view or a subquery) like this?

Thanks, and take care,

   John L. Clark

Re: WHERE condition not being pushed down to union parts

От
Tom Lane
Дата:
"John L. Clark" <jlc6@po.cwru.edu> writes:
> I have a database with two tables that relate similar data, and a view
> which projects and combines the data from these two tables in order to
> access them both in a consistent manner.  With enough information, the
> application can specifically choose to query from one table or the
> other, but in the more general case the data could come from either
> table, so I need to query the view.  When I join against the view (or
> an equivalent subselect), however, it looks like the joining condition
> is not pushed down into the individual components of the union that
> defines the view.

You never mentioned what PG version you are using, but I'm betting
it's 8.1.x.  This should work the way you are expecting in 8.2 and up.

            regards, tom lane

Re: WHERE condition not being pushed down to union parts

От
"John L. Clark"
Дата:
On Tue, Apr 21, 2009 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You never mentioned what PG version you are using, but I'm betting
> it's 8.1.x.  This should work the way you are expecting in 8.2 and up.

Naturally, I would forget (at least) one critical piece of information:

$ pg_config --version
PostgreSQL 8.3.7

Other ideas?

Take care,

    John L. Clark

Re: WHERE condition not being pushed down to union parts

От
Tom Lane
Дата:
"John L. Clark" <jlc6@po.cwru.edu> writes:
> On Tue, Apr 21, 2009 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You never mentioned what PG version you are using, but I'm betting
>> it's 8.1.x.  This should work the way you are expecting in 8.2 and up.

> Naturally, I would forget (at least) one critical piece of information:

> $ pg_config --version
> PostgreSQL 8.3.7

In that case you're going to need to provide a reproducible test case,
'cause it worksforme.

            regards, tom lane

Re: WHERE condition not being pushed down to union parts

От
"John L. Clark"
Дата:
On Tue, Apr 21, 2009 at 12:05 PM, John L. Clark <jlc6@po.cwru.edu> wrote:
> On Tue, Apr 21, 2009 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In that case you're going to need to provide a reproducible test case,
>> 'cause it worksforme.
>
> Ok.  I scaled back my example by just selecting 1000 "random" rows
> from each of the component tables.  The resulting database dump should
> be attached to this email.  I tried a very small subset (just 10
> rows), but the resulting tables were small enough that the query plans
> were changing to use scans.  Note that I haven't actually run sample
> queries with this smaller dataset.  I have only been inspecting the
> query plans of the two queries that I listed in my original message,
> and the results are the same, except that the magnitude of the costs
> are scaled down.  This scaling leads to a smaller performance penalty,
> but the query plan still shows that the join filter is still not being
> pushed down in the case of the view (built from a union).

I posted this earlier, but I haven't seen it come through the mailing
list, perhaps because of the attachment.  I have also posted the
attachment at <http://infinitesque.net/temp/union_performance_2009-04-21.postgresql.dump.gz>.
 The MD5 checksum is "3942fee39318aa5d9f18ac2ef3c298cf".  If the
original does end up coming through, I'm sorry about the redundant
post.

Take care,

    John L. Clark

Re: WHERE condition not being pushed down to union parts

От
Tom Lane
Дата:
"John L. Clark" <jlc6@po.cwru.edu> writes:
> I posted this earlier, but I haven't seen it come through the mailing
> list, perhaps because of the attachment.  I have also posted the
> attachment at <http://infinitesque.net/temp/union_performance_2009-04-21.postgresql.dump.gz>.

Ah.  The problem is that your view contains constants in the UNION arms:

CREATE VIEW uri_or_literal_object AS
    SELECT literalproperties.subject, literalproperties.subject_term, literalproperties.predicate,
literalproperties.predicate_term,literalproperties.object, 'L'::character(1) AS object_term, literalproperties.context,
literalproperties.context_term,literalproperties.data_type, literalproperties.language FROM literalproperties 
UNION ALL
    SELECT relations.subject, relations.subject_term, relations.predicate, relations.predicate_term, relations.object,
relations.object_term,relations.context, relations.context_term, NULL::bigint AS data_type, NULL::character varying(3)
ASlanguage FROM relations; 

In 8.2 and 8.3, the planner is only smart enough to generate
inner-indexscan nestloop plans on UNIONs if all the elements of the
SELECT lists are simple variables (that is, table columns).
8.4 will be smarter about this.

            regards, tom lane

Re: WHERE condition not being pushed down to union parts

От
"John L. Clark"
Дата:
On Tue, Apr 21, 2009 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ah.  The problem is that your view contains constants in the UNION arms:

> In 8.2 and 8.3, the planner is only smart enough to generate
> inner-indexscan nestloop plans on UNIONs if all the elements of the
> SELECT lists are simple variables (that is, table columns).
> 8.4 will be smarter about this.

Ah, and so it is!  I installed 8.4beta1 and have loaded it with the
big database; it is pushing the index condition down to the parts of
the UNION, and my queries are now running MUCH faster.  Here's the new
query plan for the query involving the UNION-constructed view:

<query-plan>

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..53.32 rows=1083 width=80)
   Join Filter: (component_0_statements.subject = literalproperties.subject)
   ->  Index Scan using relations_poscindex on relations
component_0_statements  (cost=0.00..13.97 rows=2 width=40)
         Index Cond: ((predicate = (-2875059751320018987)::bigint) AND
(object = (-2827607394936393903)::bigint))
   ->  Append  (cost=0.00..19.65 rows=2 width=60)
         ->  Index Scan using literalproperties_subjectindex on
literalproperties  (cost=0.00..10.05 rows=1 width=57)
               Index Cond: (literalproperties.subject =
component_0_statements.subject)
               Filter: (literalproperties.predicate =
(-2875059751320018987)::bigint)
         ->  Index Scan using relations_subjectindex on relations
(cost=0.00..9.59 rows=1 width=64)
               Index Cond: (relations.subject = component_0_statements.subject)
               Filter: (relations.predicate = (-2875059751320018987)::bigint)
(11 rows)
</query-plan>

Thanks for your help, Tom.  I am certainly amused and pleased that my
exact use case is handled in the very next PostgreSQL release.

Take care,

    John L. Clark

Re: WHERE condition not being pushed down to union parts

От
Tom Lane
Дата:
"John L. Clark" <jlc6@po.cwru.edu> writes:
> Thanks for your help, Tom.  I am certainly amused and pleased that my
> exact use case is handled in the very next PostgreSQL release.

Well, sir, your timing is excellent ;-).  That's been a known problem
for quite some time, and it was only in this release cycle that it got
addressed.

            regards, tom lane