Re: Avoiding cartesian product
От | Szűcs Gábor |
---|---|
Тема | Re: Avoiding cartesian product |
Дата | |
Msg-id | 43C28854.9000605@gmail.com обсуждение исходный текст |
Ответ на | Avoiding cartesian product ("Virag Saksena" <virag@auptyma.com>) |
Список | pgsql-performance |
Dear Virag, AFAIK aggregates aren't indexed in postgres (at least not before 8.1, which indexes min and max, iirc). Also, I don't think you need to exactly determine the trace_id. Try this one (OTOH; might be wrong): select DISTINCT ON (a.trace_id, a.seq_no) -- See below b.gc_minor - a.gc_minor, b.gc_major - a.gc_major from jam_trace_sys a, jam_trace_sys b where a.trace_id = 22 and b.trace_id = a.trace_id and b.seq_no > a.seq_no -- Simply ">" is enough order by a.trace_id, a.seq_no, b.seq_no; -- DISTINCT, see below The trick is that DISTINCT takes the first one in each group (IIRC it is granted, at least someone told me on one of these lists :) ) so if you order by the DISTINCT attributes and then by b.seq_no, you'll get the smallest of appropriate b.seq_no values for each DISTINCT values. The idea of DISTINCTing by both columns is to make sure the planner finds the index. (lately I had a similar problem: WHERE a=1 ORDER BY b LIMIT 1 used an index on b, instead of an (a,b) index. Using ORDER BY a,b solved it) HTH, -- G. On 2006.01.04. 5:12, Virag Saksena wrote: > > I have a table which stores cumulative values > I would like to display/chart the deltas between successive data > collections > > If my primary key only increments by 1, I could write a simple query > > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major > from jam_trace_sys a, jam_trace_sys b > where a.trace_id = 22 > and b.trace_id = a.trace_id > and b.seq_no = a.seq_no + 1 > order by a.seq_no; > > However the difference in sequence number is variable. > So (in Oracle) I used to extract the next seq_no using a correlated > sub-query > > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major > from jam_trace_sys a, jam_trace_sys b > where a.trace_id = 22 > and (b.trace_id, b.seq_no) = > (select a.trace_id, min(c.seq_no) from jam_trace_sys c > where c.trace_id = a.trace_id and c.seq_no > a.seq_no) > order by a.seq_no; > > For every row in A, The correlated sub-query from C will execute > With an appropriate index, it will just descend the index Btree > go one row to the right and return that row (min > :value) > and join to table B > > SELECT STATEMENT > SORT ORDER BY > TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B > NESTED LOOPS > TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS A > INDEX RANGE SCAN JAM_TRACE_SYS_N1 A > INDEX RANGE SCAN JAM_TRACE_SYS_N1 B > SORT AGGREGATE > INDEX RANGE SCAN JAM_TRACE_SYS_N1 C > > In postgreSQL A and B are doing a cartesian product > then C gets executed for every row in this cartesian product > and most of the extra rows get thrown out. > Is there any way to force an execution plan like above where the > correlated subquery runs before going to B. > The table is small right now, but it will grow to have millions of rows > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Sort (cost=124911.81..124944.84 rows=13213 width=20) (actual > time=13096.754..13097.053 rows=149 loops=1) > Sort Key: a.seq_no > -> Nested Loop (cost=4.34..124007.40 rows=13213 width=20) (actual > time=1948.300..13096.329 rows=149 loops=1) > Join Filter: (subplan) > -> Seq Scan on jam_trace_sys b (cost=0.00..3.75 rows=175 > width=16) (actual time=0.005..0.534 rows=175 loops=1) > -> Materialize (cost=4.34..5.85 rows=151 width=16) (actual > time=0.002..0.324 rows=150 loops=175) > -> Seq Scan on jam_trace_sys a (cost=0.00..4.19 > rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1) > Filter: (trace_id = 22) > SubPlan > -> Aggregate (cost=4.67..4.67 rows=1 width=4) (actual > time=0.486..0.488 rows=1 loops=26250) > -> Seq Scan on jam_trace_sys c (cost=0.00..4.62 > rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250) > Filter: ((trace_id = $0) AND (seq_no > $1)) > Total runtime: 13097.557 ms > (13 rows) > > pglnx01=> \d jam_trace_sys > Table "public.jam_trace_sys" > Column | Type | Modifiers > -----------------+---------+----------- > trace_id | integer | > seq_no | integer | > cpu_utilization | integer | > gc_minor | integer | > gc_major | integer | > heap_used | integer | > Indexes: > "jam_trace_sys_n1" btree (trace_id, seq_no) > > pglnx01=> select count(*) from jam_trace_Sys ; > count > ------- > 175 > (1 row) > > pglnx01=> select trace_id, count(*) from jam_trace_sys group by trace_id ; > trace_id | count > ----------+------- > 15 | 2 > 18 | 21 > 22 | 150 > 16 | 2 > (4 rows)
В списке pgsql-performance по дате отправления: