Re: 8.4 optimization regression?

Поиск
Список
Период
Сортировка
От Mark Kirkwood
Тема Re: 8.4 optimization regression?
Дата
Msg-id 4E5DCFA4.5090806@catalyst.net.nz
обсуждение исходный текст
Ответ на Re: 8.4 optimization regression?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: 8.4 optimization regression?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On 24/08/11 15:15, Tom Lane wrote:
> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
>> have stumbled upon what looks like a regression. The two databases
>> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes
>> sense) and run on identical hardware. Both databases are regularly
>> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
>> does not change the plans shown below.
> Hmmm ... this is structurally a pretty simple query, so I'm surprised
> that 8.3 and 8.4 see it very much differently.  The relation-level
> estimates and plan choices are very nearly the same; the only thing
> that's changed much is the estimates of the join sizes, and there were
> not that many changes in the join selectivity estimation for simple
> inner joins.  I wonder whether you are seeing a bad side-effect of this
> patch:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
>
> That code would only be reached when one or both join columns lack MCV
> lists in pg_stats; if you had analyzed, the only reason for that to be
> the case is if the column is unique (or nearly so, in ANALYZE's opinion).
>
>

I've come up with (hopefully) a good set of semi, anti and regular joins
to demonstrate the effect of this commit. I've attached them, and the
schema generator (I believe I've used this before for optimization
examples...).

Also I've tried out an experimental patch to make joins like the one I'm
having trouble with *and* also the anti joins the commit was for - get
better row estimates.

So firstly consider an anti join (these are run against git HEAD rather
than 8.4):

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..765699.66 rows=1599293 width=0)
(actual time=29907.716..47255.825 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.373..11838.738 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29883.980..29883.980 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.339..29295.764 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..760501.96 rows=1 width=0) (actual
time=30409.336..47919.613 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.359..12081.372 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=30392.235..30392.235 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.384..29806.407 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Note the rows estimate for the anti join is hopelessly wrong, so clearly
the commitdoes the job here (I think this models the test case for said
commit)!

Now some joins:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-01-01'::timestamp AND nk.keywordid <
100000;

With commit:
                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871512.65 rows=1991560 width=0) (actual
time=30032.836..53073.731 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.327..14393.629 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30017.777..30017.777 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.005..23272.287 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871510.70 rows=1991365 width=0) (actual
time=30549.498..54852.399 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.331..13760.417 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30534.464..30534.464 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.005..23696.167 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Another join:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-12-01'::timestamp AND nk.keywordid <
100000;

With commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..764424.63 rows=392267 width=0) (actual
time=29295.966..45578.876 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.452..12367.760 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29273.571..29273.571 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.899..28678.818 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..762064.41 rows=156245 width=0) (actual
time=29179.313..44605.243 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.486..11546.469 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29156.889..29156.889 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.915..28545.553 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


So in the case where we filer out a large percentage of the rows the
commit inflates the estimates...consider a more extreme example:


EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-12-27'::timestamp AND nk.keywordid <
10000;

With commit:
                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=16344 width=0) (actual
time=55.452..65341.135 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=13.537..46138.214 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.277..1.279 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


Without commit:
                                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=631 width=0) (actual
time=43.969..64988.036 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=2.060..46065.879 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.259..1.260 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


So clearly this commit is not so good for this type of join (this models
the case I posted initially).

Now four semi joins:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..753629.40 rows=392267 width=0)
(actual time=28405.965..43724.471 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.767..11561.340 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28391.293..28391.293 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.038..27820.097 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..780417.56 rows=1991560 width=0)
(actual time=29447.638..44738.280 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.771..11501.350 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29433.952..29433.952 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.040..28850.800 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Clearly row estimation is hopelessly broken *without* this commit here.

Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );

With commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=29048.154..51230.453 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.423..13430.618 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29024.442..29024.442 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22384.904 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=28914.970..51162.918 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.504..13780.506 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=28891.705..28891.705 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..22082.459 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );

With commit:
                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=30120.347..49646.175 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.359..16335.889 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30072.444..30072.444 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..23409.799 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=29395.513..48857.600 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.528..16261.983 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29348.826..29348.826 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..22505.930 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Final semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=29060.665..44713.615 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.366..15064.457 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29026.017..29026.017 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.039..28441.039 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=28969.107..43725.339 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.486..14198.613 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28935.248..28935.248 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.047..28343.005 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Well this guy is too sneaky for either case :-(

We seem to need a patch variant that *only* clamps the estimates in the
anti or semi join case, e.g (note against git HEAD):

diff --git a/src/backend/utils/adt/selfuncs.c
b/src/backend/utils/adt/selfuncs.c
index e065826..bf5002f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2257,11 +2257,6 @@ eqjoinsel_inner(Oid operator,
         double      nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
         double      nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;

-       if (vardata1->rel)
-           nd1 = Min(nd1, vardata1->rel->rows);
-       if (vardata2->rel)
-           nd2 = Min(nd2, vardata2->rel->rows);
-
         selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
         if (nd1 > nd2)
             selec /= nd1;


Now run all the queries, 1st the anti join:
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..765699.66 rows=1599293 width=0)
(actual time=30121.008..48503.453 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.623..12853.522 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=30108.058..30108.058 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.347..29508.393 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


And the  3 joins:

                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871510.70 rows=1991365 width=0) (actual
time=30148.073..52370.308 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.291..13300.233 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30124.453..30124.453 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..23334.774 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..762064.41 rows=156245 width=0) (actual
time=29954.251..46014.379 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.420..12126.142 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29936.578..29936.578 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.934..29357.789 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=631 width=0) (actual
time=44.065..67179.686 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=2.165..48523.075 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.241..1.242 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


And the 4 semi joins...

                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..753629.40 rows=392267 width=0)
(actual time=29355.949..45220.958 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.731..11983.132 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29342.387..29342.387 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.039..28763.514 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=30823.334..53136.910 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.555..13881.366 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30800.017..30800.017 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..24028.932 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=29278.861..48346.647 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.523..15809.390 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29232.161..29232.161 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..22541.899 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=28594.210..42976.001 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.581..13810.924 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28560.258..28560.258 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.048..27983.235 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

(this last one was wildly inaccurate pre patching)

So this looks quite encouraging (unless I have overlooked a set of
queries that now perform worse - which could be the case), thoughts?

regards

Mark




Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: 8.4 optimization regression?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: IN or EXISTS