Обсуждение: Constraint exclusion won't exclude parent table

Поиск
Список
Период
Сортировка

Constraint exclusion won't exclude parent table

От
Tim Kane
Дата:
Hi all,

First some background.
I have inherited a system that appears to have a lot of logic built into views upon views upon views (and then some more views for good measure).
It struck me that the CASE conditions built into those views are causing poorer performance than expected – so I thought I would run a few tests against the base tables see where the difference lies.

Anyway, that’s all by the by.. Because what I found on my travels is that the parent table of the relevant partitions is being included and appended in the query plan.
This is all documented and I understand why, fine.  But the impact of this is greater than I was expecting.

For instance, if I query the partition directly (for all tuples it contains) versus a query that targets the same partition via exclusion rules - I find the direct query runs in less than half the time.


Direct query:
 Seq Scan on partitioned.ts_201405 track_streams  (cost=0.00..4167467.56 rows=65067252 width=253) (actual time=0.010..96796.053 rows=65328073 loops=1)
   Output: 
   Filter: 
   Buffers: shared hit=354 read=2215096
 Total runtime: 137437.675 ms
(5 rows)

Indirect query:
 Result  (cost=0.00..4167467.56 rows=65067253 width=253) (actual time=0.011..250057.941 rows=65328073 loops=1)
   Output:
   Buffers: shared hit=322 read=2215128
   ->  Append  (cost=0.00..4167467.56 rows=65067253 width=253) (actual time=0.010..163452.326 rows=65328073 loops=1)
         Buffers: shared hit=322 read=2215128
         ->  Seq Scan on archive.ts  (cost=0.00..0.00 rows=1 width=199) (actual time=0.001..0.001 rows=0 loops=1)
               Output: 
               Filter:
         ->  Seq Scan on partitioned.ts_201405  (cost=0.00..4167467.56 rows=65067252 width=253) (actual time=0.006..85883.925 rows=65328073 loops=1)
               Output: 
               Filter:
               Buffers: shared hit=322 read=2215128
 Total runtime: 289238.187 ms
(13 rows)


So what is the append node actually doing, and why is it necessary?  
I expect that it simply does what it says, and appends the results of those two seq-scans.  But in reality, there isn’t a lot to do there. While I expect a little bit of overhead, surely it just passes the tuples straight through to the result node and that will be that.. No?

(yeah, I’ve made a few assumptions/guesses here, but I’m not sure I’m ready to look at the code just yet)


Cheers,

Tim









Re: Constraint exclusion won't exclude parent table

От
Tom Lane
Дата:
Tim Kane <tim.kane@gmail.com> writes:
> So what is the append node actually doing, and why is it necessary?
> I expect that it simply does what it says, and appends the results of those
> two seq-scans.  But in reality, there isn’t a lot to do there. While I
> expect a little bit of overhead, surely it just passes the tuples straight
> through to the result node and that will be that.. No?

Yeah, it's not expected that that's going to cost much.  I am suspicious
that what you are looking at is mostly measurement overhead: during
EXPLAIN ANALYZE, each plan node has to do two gettimeofday() calls per
call, and there are lots of platforms where that is significant relative
to the actual work done per node.

You might try comparing the overall times for select count(*) from ...
rather than EXPLAIN ANALYZE for these two cases.  If those times are
much closer together than what you're getting from EXPLAIN ANALYZE,
then you've got a machine with expensive gettimeofday() and you have
to take your measurements with an appropriate quantum of salt.

            regards, tom lane


Re: Constraint exclusion won't exclude parent table

От
Tim Kane
Дата:


Yeah, it's not expected that that's going to cost much.  I am suspicious
that what you are looking at is mostly measurement overhead: during
EXPLAIN ANALYZE, each plan node has to do two gettimeofday() calls per
call, and there are lots of platforms where that is significant relative
to the actual work done per node.

You might try comparing the overall times for select count(*) from ...
rather than EXPLAIN ANALYZE for these two cases.  If those times are
much closer together than what you're getting from EXPLAIN ANALYZE,
then you've got a machine with expensive gettimeofday() and you have
to take your measurements with an appropriate quantum of salt.

regards, tom lane

Interesting.. 

Direct query:
Time: 374336.514 ms

Indirect query:
Time: 387114.059 ms

Mystery solved. Thanks again Tom.

For what it’s worth: Linux 3.2.0-4-amd64  Debian 3.2.46-1+deb7u1 x86_64