Re: multivariate statistics / patch v6

Поиск
Список
Период
Сортировка
От	Kyotaro HORIGUCHI
Тема	Re: multivariate statistics / patch v6
Дата	13 мая 2015 г. 08:32:26
Msg-id	20150513.173159.102421446.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст
Ответ на	Re: multivariate statistics / patch v6 (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: multivariate statistics / patch v6
Список	pgsql-hackers
Дерево обсуждения
Hello, this might be somewhat out of place but strongly related
to this patch so I'll propose this here.

This is a proposal of new feature for this patch or asking for
your approval for my moving on this as a different (but very
close) project.

===

> Attached is v6 of the multivariate stats, with a number of
> improvements:
...
> 2) fix of pg_proc issues (reported by Jeff)
> 
> 3) rebase to current master

Unfortunately, the v6 patch suffers some system oid conflicts
with recently added ones. And what more unfortunate for me is
that the code for functional dependencies looks undone:)

I mention this because I recently had a issue from strong
correlation between two columns in dbt3 benchmark. Two columns in
some table are in strong correlation but not in functional
dependencies, there are too many values and the distribution of
them is very uniform so MCV is no use for the table (histogram
has nothing to do with equal conditions). As the result, planner
estimates the number of rows largely wrong as expected especially
for joins.

I, then, had a try calculating the ratio between the product of
distinctness of every column and the distinctness of the set of
the columns, call it multivariate coefficient here, and found
that it looks greately useful for the small storage space, less
calculation, and simple code.

The attached first is a script to generate problematic tables.
And the second is a patch to make use of the mv coef on current
master.  The patch is a very primitive POC so no syntactical
interfaces involved.

For the case of your first example,

> =# create table t (a int, b int, c int);
> =# insert into t (select a/10000, a/10000, a/10000
>                   from generate_series(0, 999999) a);
> =# analyze t;
> =# explain analyze select * from t where a = 1 and b = 1 and c = 1;
>  Seq Scan on t  (cost=0.00..22906.00 rows=1 width=12)
>                 (actual time=3.878..250.628 rows=10000 loops=1)

Make use of mv coefficient.

> =# insert into pg_mvcoefficient values ('t'::regclass, 1, 2, 3, 0);
> =# analyze t;
> =# explain analyze select * from t where a = 1 and b = 1 and c = 1;
>  Seq Scan on t  (cost=0.00..22906.00 rows=9221 width=12)
>                 (actual time=3.740..242.330 rows=10000 loops=1)

Row number estimation was largely improved.

Well, my example,

> $ perl gentbl.pl 10000 | psql postgres
> $ psql postgres
> =# explain analyze select * from t1 where a = 1 and b = 2501;
>  Seq Scan on t1  (cost=0.00..6216.00 rows=1 width=8)
>                  (actual time=0.030..66.005 rows=8 loops=1)
> 
> =# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
>  Hash Join  (cost=1177.00..11393.76 rows=76 width=16)
>             (actual time=29.811..322.271 rows=320000 loops=1)

Too bad estimate for the join.

> =# insert into pg_mvcoefficient values ('t1'::regclass, 1, 2, 0, 0);
> =# analyze t1;
> =# explain analyze select * from t1 where a = 1 and b = 2501;
>  Seq Scan on t1  (cost=0.00..6216.00         rows=8 width=8)
>                  (actual time=0.032..104.144 rows=8 loops=1)
> 
> =# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
>  Hash Join  (cost=1177.00..11393.76      rows=305652 width=16)
>             (actual time=40.642..325.679 rows=320000 loops=1)

It gives almost correct estimations.

I think the result above shows that the multivariate coefficient
is significant to imporove estimates when correlated colums are
involved.

Would you consider this in your patch? Otherwise I move on this
as a different project from yours if you don't mind. Except user
interface won't conflict with yours, I suppose. But finally they
should need some labor of consolidation.

regards,

> 1) fix of the contrib compile-time errors (reported by Jeff)
> 
> 2) fix of pg_proc issues (reported by Jeff)
> 
> 3) rebase to current master
> 
> 4) fix a bunch of issues in the previous patches, due to referencing
>    some parts too early (e.g. histograms in the first patch, etc.)
> 
> 5) remove the explicit DELETEs from pg_mv_statistic (in the regression
>    tests), this is now handled automatically by DROP TABLE etc.
> 
> 6) number of performance optimizations in selectivity estimations:
> 
>    (a) minimize calls to get_oprrest, significantly reducing
>        syscache calls
> 
>    (b) significant reduction of palloc overhead in deserialization of
>        MCV lists and histograms
> 
>    (c) use more compact serialized representation of MCV lists and
>        histograms, often removing ~50% of the size
> 
>    (d) use histograms with limited deserialization, which also allows
>        caching function calls
> 
>    (e) modified histogram bucket partitioning, resulting in more even
>        bucket distribution (i.e. producing buckets with more equal
>        density and about equal size of each dimension)
> 
> 7) add functions for listing MCV list items and histogram buckets:
> 
>     - pg_mv_mcvlist_items(oid)
>     - pg_mv_histogram_buckets(oid, type)
> 
>    This is quite useful when analyzing the MCV lists / histograms.
> 
> 8) improved support for OR clauses
> 
> 9) allow calling pull_varnos() on expression trees containing
>    RestrictInfo nodes (not sure if this is the right fix, it's being
>    discussed in another thread)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /usr/bin/perl

$DBNAME="postgres";
$scale = 1000;

if ($#ARGV >= 0) {$scale = $ARGV[0];
}

print "create table t1 (a int, b int);\n";
print "create table t2 (a int, b int);\n";
print "insert into t1 values ";
$delim = "";
for $a (1..$scale) {for $x (1, 2500, 5000, 7500) {    $b = $a + $x;    print "\n";    for $i (1..8) {        print
$delim;       $delim = ", ";        print "($a, $b)";    }}
 
}
print ";\n";
print "insert into t2 values ";
$delim = "";
for $a (1..$scale) {print "\n";for $x (1, 2500, 5000, 7500) {    $b = $a + $x;    print $delim;    $delim = ", ";
print"($a, $b)";}
 
}
print ";\n";
print "analyze t1;\n";
print "analyze t2;\n";
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 37d05d1..d00835e 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -33,7 +33,8 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\    pg_opfamily.h pg_opclass.h
pg_am.hpg_amop.h pg_amproc.h \    pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
pg_statistic.hpg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 
-    pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
+    pg_cast.h pg_enum.h pg_mvcoefficient.h pg_namespace.h pg_conversion.h \
+    pg_depend.h \    pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \    pg_authid.h
pg_auth_members.hpg_shdepend.h pg_shdescription.h \    pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 15ec0ad..9edaa0f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@#include "catalog/indexing.h"#include "catalog/pg_collation.h"#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mvcoefficient.h"#include "catalog/pg_namespace.h"#include "commands/dbcommands.h"#include
"commands/tablecmds.h"
@@ -45,7 +46,9 @@#include "storage/procarray.h"#include "utils/acl.h"#include "utils/attoptcache.h"
+#include "utils/catcache.h"#include "utils/datum.h"
+#include "utils/fmgroids.h"#include "utils/guc.h"#include "utils/lsyscache.h"#include "utils/memutils.h"
@@ -110,6 +113,12 @@ static void update_attstats(Oid relid, bool inh,                int natts, VacAttrStats
**vacattrstats);staticDatum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);static Datum
ind_fetch_func(VacAttrStatsPstats, int rownum, bool *isNull);
 
+static float4 compute_mv_distinct(int nattrs,
+                                  int *stacolnums,
+                                  VacAttrStats **stats,
+                                  AnalyzeAttrFetchFunc fetchfunc,
+                                  int samplerows,
+                                  double totalrows);/*
@@ -552,6 +561,92 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
MemoryContextResetAndDeleteChildren(col_context);       }
 
+        /* Compute multivariate distinctness if ordered */
+        {
+            ScanKeyData    scankey;
+            SysScanDesc    sysscan;
+            Relation    mvcrel;
+            HeapTuple    oldtup, newtup;
+            int            i;
+
+            mvcrel = heap_open(MvCoefficientRelationId, RowExclusiveLock);
+
+            ScanKeyInit(&scankey,
+                        Anum_pg_mvcoefficient_mvcreloid,
+                        BTEqualStrategyNumber, F_OIDEQ,
+                        ObjectIdGetDatum(onerel->rd_id));
+            sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true,
+                                         NULL, 1, &scankey);
+            oldtup = systable_getnext(sysscan);
+
+            while (HeapTupleIsValid(oldtup))
+            {
+                int        colnums[3];
+                int        ncols = 0;
+                float4    nd;
+                Datum    values[Natts_pg_mvcoefficient];
+                bool    nulls[Natts_pg_mvcoefficient];
+                bool    replaces[Natts_pg_mvcoefficient];
+                float4        simple_mv_distinct;
+                
+                Form_pg_mvcoefficient mvc =
+                    (Form_pg_mvcoefficient) GETSTRUCT (oldtup);
+
+                if (mvc->mvcattr1 > 0)
+                    colnums[ncols++] = mvc->mvcattr1 - 1;
+                if (mvc->mvcattr2 > 0)
+                    colnums[ncols++] = mvc->mvcattr2 - 1;
+                if (mvc->mvcattr3 > 0)
+                    colnums[ncols++] = mvc->mvcattr3 - 1;
+
+                if (ncols > 0)
+                {
+                    int        j;
+                    float4    nd_coef;
+
+                    simple_mv_distinct = 
+                        vacattrstats[colnums[0]]->stadistinct;
+                    if (simple_mv_distinct < 0)
+                        simple_mv_distinct = -simple_mv_distinct * totalrows;
+                    for (j = 1 ; j < ncols ; j++)
+                    {
+                        float4 t = vacattrstats[colnums[j]]->stadistinct;
+
+                        if (t < 0)
+                            t = -t * totalrows;
+                        simple_mv_distinct *= t;
+                    }
+
+                    nd = compute_mv_distinct(j, colnums, vacattrstats,
+                                     std_fetch_func, numrows, totalrows);
+
+                    nd_coef = nd / simple_mv_distinct;
+                    
+                    for (i = 0; i < Natts_pg_mvcoefficient ; ++i)
+                    {
+                        nulls[i] = false;
+                        replaces[i] = false;
+                    }
+                    values[Anum_pg_mvcoefficient_mvccoefficient - 1] =
+                        Float4GetDatum(nd_coef);
+                    replaces[Anum_pg_mvcoefficient_mvccoefficient - 1] = true;
+                    newtup = heap_modify_tuple(oldtup,
+                                               RelationGetDescr(mvcrel),
+                                               values,
+                                               nulls,
+                                               replaces);
+                    simple_heap_update(mvcrel, &oldtup->t_self, newtup);
+
+                    CatalogUpdateIndexes(mvcrel, newtup);
+
+                    oldtup = systable_getnext(sysscan);
+                }
+            }
+
+            systable_endscan(sysscan);
+            heap_close(mvcrel, RowExclusiveLock);
+        }
+        if (hasindex)            compute_index_stats(onerel, totalrows,                                indexdata,
nindexes,
@@ -1911,6 +2006,7 @@ static void compute_scalar_stats(VacAttrStatsP stats,                     int samplerows,
           double totalrows);static int    compare_scalars(const void *a, const void *b, void *arg);
 
+static int  compare_mv_scalars(const void *a, const void *b, void *arg);static int    compare_mcvs(const void *a,
constvoid *b);
 
@@ -2840,6 +2936,207 @@ compute_scalar_stats(VacAttrStatsP stats,}/*
+ *    compute_mv_distinct() -- compute multicolumn distinctness
+ */ 
+
+static float4
+compute_mv_distinct(int nattrs,
+                    int *stacolnums,
+                    VacAttrStats **stats,
+                    AnalyzeAttrFetchFunc fetchfunc,
+                    int samplerows,
+                    double totalrows)
+{
+    int            i, j;
+    int            null_cnt = 0;
+    int            nonnull_cnt = 0;
+    int            toowide_cnt = 0;
+    double        total_width = 0;
+    bool        is_varlena[3];
+    SortSupportData ssup[3];
+    ScalarItem **values, *values2;
+    int            values_cnt = 0;
+    int           *tupnoLink;
+    StdAnalyzeData *mystats[3];
+    float4        fndistinct;
+
+    Assert (nattrs <= 3);
+    for (i = 0 ; i < nattrs ; i++)
+    {
+        VacAttrStats *vas = stats[stacolnums[i]];
+        is_varlena[i] =
+            !vas->attrtype->typbyval && vas->attrtype->typlen == -1;
+        mystats[i] =
+            (StdAnalyzeData*) vas->extra_data;
+    }
+
+    values2 = (ScalarItem *) palloc(nattrs * samplerows * sizeof(ScalarItem));
+    values  = (ScalarItem **) palloc(samplerows * sizeof(ScalarItem*));
+    tupnoLink = (int *) palloc(samplerows * sizeof(int));
+
+    for (i = 0 ; i < samplerows ; i++)
+        values[i] = &values2[i * nattrs];
+
+    memset(ssup, 0, sizeof(ssup));
+    for (i = 0 ; i < nattrs ; i++)
+    {
+        ssup[i].ssup_cxt = CurrentMemoryContext;
+        /* We always use the default collation for statistics */
+        ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+        ssup[i].ssup_nulls_first = false;
+        ssup[i].abbreviate = true;
+        PrepareSortSupportFromOrderingOp(mystats[i]->ltopr, &ssup[i]);
+    }
+    ssup[nattrs].ssup_cxt = NULL;
+
+    /* Initial scan to find sortable values */
+    for (i = 0; i < samplerows; i++)
+    {
+        Datum        value[2];
+        bool        isnull = false;
+        bool        toowide = false;
+
+        vacuum_delay_point();
+
+        for (j = 0 ; j < nattrs ; j++)
+        {
+
+            value[j] = fetchfunc(stats[stacolnums[j]], i, &isnull);
+
+            /* Check for null/nonnull */
+            if (isnull)
+                break;
+
+            if (is_varlena[j])
+            {
+                total_width += VARSIZE_ANY(DatumGetPointer(value[j]));
+                if (toast_raw_datum_size(value[j]) > WIDTH_THRESHOLD)
+                {
+                    toowide = true;
+                    break;
+                }
+                value[j] = PointerGetDatum(PG_DETOAST_DATUM(value[j]));
+            }
+        }
+        if (isnull)
+        {
+            null_cnt++;
+            continue;
+        }
+        else if (toowide)
+        {
+            toowide_cnt++;
+            continue;
+        }
+        nonnull_cnt++;
+
+        /* Add it to the list to be sorted */
+        for (j = 0 ; j < nattrs ; j++)
+            values[values_cnt][j].value = value[j];
+
+        values[values_cnt][0].tupno = values_cnt;
+        tupnoLink[values_cnt] = values_cnt;
+        values_cnt++;
+    }
+
+    /* We can only compute real stats if we found some sortable values. */
+    if (values_cnt > 0)
+    {
+        int            ndistinct,    /* # distinct values in sample */
+                    nmultiple,    /* # that appear multiple times */
+                    dups_cnt;
+        CompareScalarsContext cxt;
+
+        /* Sort the collected values */
+        cxt.ssup = ssup;
+        cxt.tupnoLink = tupnoLink;
+        qsort_arg((void *) values, values_cnt, sizeof(ScalarItem*),
+                  compare_mv_scalars, (void *) &cxt);
+
+        ndistinct = 0;
+        nmultiple = 0;
+        dups_cnt = 0;
+        for (i = 0; i < values_cnt; i++)
+        {
+            int            tupno = values[i][0].tupno;
+
+            dups_cnt++;
+            if (tupnoLink[tupno] == tupno)
+            {
+                /* Reached end of duplicates of this value */
+                ndistinct++;
+                if (dups_cnt > 1)
+                    nmultiple++;
+
+                dups_cnt = 0;
+            }
+        }
+
+        if (nmultiple == 0)
+        {
+            /* If we found no repeated values, assume it's a unique column */
+            fndistinct = totalrows;
+        }
+        else if (toowide_cnt == 0 && nmultiple == ndistinct)
+        {
+            /*
+             * Every value in the sample appeared more than once.  Assume the
+             * column has just these values.
+             */
+            fndistinct = (float4)ndistinct;
+        }
+        else
+        {
+            /*----------
+             * Estimate the number of distinct values using the estimator
+             * proposed by Haas and Stokes in IBM Research Report RJ 10025:
+             *        n*d / (n - f1 + f1*n/N)
+             * where f1 is the number of distinct values that occurred
+             * exactly once in our sample of n rows (from a total of N),
+             * and d is the total number of distinct values in the sample.
+             * This is their Duj1 estimator; the other estimators they
+             * recommend are considerably more complex, and are numerically
+             * very unstable when n is much smaller than N.
+             *
+             * Overwidth values are assumed to have been distinct.
+             *----------
+             */
+            int            f1 = ndistinct - nmultiple + toowide_cnt;
+            int            d = f1 + nmultiple;
+            double        numer,
+                        denom,
+                        stadistinct;
+
+            numer = (double) samplerows *(double) d;
+
+            denom = (double) (samplerows - f1) +
+                (double) f1 *(double) samplerows / totalrows;
+
+            stadistinct = numer / denom;
+            /* Clamp to sane range in case of roundoff error */
+            if (stadistinct < (double) d)
+                stadistinct = (double) d;
+            if (stadistinct > totalrows)
+                stadistinct = totalrows;
+            fndistinct = floor(stadistinct + 0.5);
+        }
+    }
+    else if (nonnull_cnt > 0)
+    {
+        /* Assume all too-wide values are distinct, so it's a unique column */
+        fndistinct = totalrows;
+    }
+    else if (null_cnt > 0)
+    {
+        fndistinct =  0.0;        /* "unknown" */
+    }
+
+    /* We don't need to bother cleaning up any of our temporary palloc's */
+    return fndistinct;
+}
+
+
+/* * qsort_arg comparator for sorting ScalarItems * * Aside from sorting the items, we update the tupnoLink[] array
@@ -2876,6 +3173,43 @@ compare_scalars(const void *a, const void *b, void *arg)    return ta - tb;}
+static int
+compare_mv_scalars(const void *a, const void *b, void *arg)
+{
+    CompareScalarsContext *cxt = (CompareScalarsContext *) arg;
+    ScalarItem *va = *(ScalarItem**)a;
+    ScalarItem *vb = *(ScalarItem**)b;
+    Datum        da, db;
+    int            ta, tb;
+    int            compare;
+    int i;
+
+    for (i = 0 ; cxt->ssup[i].ssup_cxt ; i++)
+    {
+        da = va[i].value;
+        db = vb[i].value;
+
+        compare = ApplySortComparator(da, false, db, false, &cxt->ssup[i]);
+        if (compare != 0)
+            return compare;
+    }
+
+    /*
+     * The two datums are equal, so update cxt->tupnoLink[].
+     */
+    ta = va[0].tupno;
+    tb = vb[0].tupno;
+    if (cxt->tupnoLink[ta] < tb)
+        cxt->tupnoLink[ta] = tb;
+    if (cxt->tupnoLink[tb] < ta)
+        cxt->tupnoLink[tb] = ta;
+
+    /*
+     * For equal datums, sort by tupno
+     */
+    return ta - tb;
+}
+/* * qsort comparator for sorting ScalarMCVItems by position */
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..43712ba 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,8 +14,14 @@ */#include "postgres.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/indexing.h"#include "catalog/pg_operator.h"
+#include "catalog/pg_mvcoefficient.h"#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"#include "optimizer/clauses.h"#include "optimizer/cost.h"#include "optimizer/pathnode.h"
@@ -43,6 +49,93 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,               bool varonleft,
boolisLTsel, Selectivity s2);
 
+static bool
+collect_collist_walker(Node *node, Bitmapset **colsetlist)
+{
+    if (node == NULL)
+        return false;
+    if (IsA(node, Var))
+    {
+        Var *var = (Var*)node;
+
+        if (AttrNumberIsForUserDefinedAttr(var->varattno))
+            colsetlist[var->varno] = 
+                bms_add_member(colsetlist[var->varno], var->varattno);
+    }
+    return expression_tree_walker(node, collect_collist_walker,
+                                  (void*)colsetlist);
+}
+
+/* Find multivariate distinctness coefficient for clauselist */
+static double
+find_mv_join_coeffeicient(PlannerInfo *root, List *clauses)
+{
+    int relid;
+    ListCell   *l;
+    Bitmapset **colsetlist = NULL;
+    double mv_coef = 1.0;
+
+    /* Collect columns this clauselist on */
+    colsetlist = (Bitmapset**)
+        palloc0(root->simple_rel_array_size * sizeof(Bitmapset*));
+
+    foreach(l, clauses)
+    {
+        RestrictInfo *rti = (RestrictInfo *) lfirst(l);
+
+        /* Consider only EC-derived clauses between the joinrels */
+        if (rti->left_ec && rti->left_ec == rti->right_ec)
+        {
+            if (IsA(rti, RestrictInfo))
+                collect_collist_walker((Node*)rti->clause, colsetlist);
+        }
+    }
+
+    /* Find pg_mv_coefficient entries match this columlist */
+    for (relid = 1 ; relid < root->simple_rel_array_size ; relid++)
+    {
+        Relation mvcrel;
+        SysScanDesc sscan;
+        ScanKeyData skeys[1];
+        HeapTuple tuple;
+        
+        if (bms_is_empty(colsetlist[relid])) continue;
+
+        if (root->simple_rte_array[relid]->rtekind != RTE_RELATION) continue;
+
+        ScanKeyInit(&skeys[0],
+                    Anum_pg_mvcoefficient_mvcreloid,
+                    BTEqualStrategyNumber, F_OIDEQ,
+                    ObjectIdGetDatum(root->simple_rte_array[relid]->relid));
+        
+        mvcrel = heap_open(MvCoefficientRelationId, AccessShareLock);
+        sscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true,
+                                   NULL, 1, skeys);
+        while (HeapTupleIsValid(tuple = systable_getnext(sscan)))
+        {
+            Bitmapset *mvccols = NULL;
+            Form_pg_mvcoefficient mvc =
+                (Form_pg_mvcoefficient) GETSTRUCT (tuple);
+
+            mvccols = bms_add_member(mvccols, mvc->mvcattr1);
+            mvccols = bms_add_member(mvccols, mvc->mvcattr2);
+            if (mvc->mvcattr3 > 0)
+                mvccols = bms_add_member(mvccols, mvc->mvcattr3);
+
+            if (!bms_is_subset(mvccols, colsetlist[relid]))
+                continue;
+
+            /* Prefer smaller one */
+            if (mvc->mvccoefficient > 0 && mvc->mvccoefficient < mv_coef)
+                mv_coef = mvc->mvccoefficient;
+        }
+        systable_endscan(sscan);
+        heap_close(mvcrel, AccessShareLock);
+    }
+
+    return mv_coef;
+}
+/**************************************************************************** *        ROUTINES TO COMPUTE
SELECTIVITIES****************************************************************************/
 
@@ -200,6 +293,9 @@ clauselist_selectivity(PlannerInfo *root,        s1 = s1 * s2;    }
+    /* Try multivariate distinctness correction for clauses */
+    s1 /= find_mv_join_coeffeicient(root, clauses);
+    /*     * Now scan the rangequery pair list.     */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index f58e1ce..f4c1001 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@#include "catalog/pg_foreign_server.h"#include "catalog/pg_foreign_table.h"#include
"catalog/pg_language.h"
+#include "catalog/pg_mvcoefficient.h"#include "catalog/pg_namespace.h"#include "catalog/pg_opclass.h"#include
"catalog/pg_operator.h"
@@ -501,6 +502,17 @@ static const struct cachedesc cacheinfo[] = {        },        4    },
+    {MvCoefficientRelationId,        /* MVCOEFFICIENT */
+        MvCoefficientIndexId,
+        4,
+        {
+            Anum_pg_mvcoefficient_mvcreloid,
+            Anum_pg_mvcoefficient_mvcattr1,
+            Anum_pg_mvcoefficient_mvcattr2,
+            Anum_pg_mvcoefficient_mvcattr3
+        },
+        4
+    },    {NamespaceRelationId,        /* NAMESPACENAME */        NamespaceNameIndexId,        1,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 71e0010..0c76f93 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,9 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject
usingDECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index,2996, on pg_largeobject_metadata using btree(oid
oid_ops));#defineLargeObjectMetadataOidIndexId    2996
 
+DECLARE_UNIQUE_INDEX(pg_mvcoefficient_index, 3578, on pg_mvcoefficient using btree(mvcreloid oid_ops, mvcattr1
int2_ops,mvcattr2 int2_ops, mvcattr3 int2_ops));
 
+#define MvCoefficientIndexId  3578
+DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));#define
NamespaceNameIndexId 2684DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
 
diff --git a/src/include/catalog/pg_mvcoefficient.h b/src/include/catalog/pg_mvcoefficient.h
new file mode 100644
index 0000000..56259fd
--- /dev/null
+++ b/src/include/catalog/pg_mvcoefficient.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mvcoefficient.h
+ *      definition of the system multivariate coefficient relation
+ *      (pg_mvcoefficient) along with the relation's initial contents.
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * src/include/catalog/pg_mvcoefficient.h
+ *
+ * NOTES
+ *      the genbki.pl script reads this file and generates .bki
+ *      information from the DATA() statements.
+ *
+ *      XXX do NOT break up DATA() statements into multiple lines!
+ *          the scripts are not as smart as you might think...
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MVCOEFFICIENT_H
+#define PG_MVCOEFFICIENT_H
+
+#include "catalog/genbki.h"
+#include "nodes/pg_list.h"
+
+/* ----------------
+ *        pg_mvcoefficient definition.  cpp turns this into
+ *        typedef struct FormData_pg_mvcoefficient
+ * ----------------
+ */
+#define MvCoefficientRelationId    3577
+
+CATALOG(pg_mvcoefficient,3577) BKI_WITHOUT_OIDS
+{
+    Oid        mvcreloid;            /* OID of target relation */
+    int16    mvcattr1;            /* Column numbers */
+    int16    mvcattr2;
+    int16    mvcattr3;
+    float4    mvccoefficient;        /* multivariate distinctness coefficient */
+} FormData_pg_mvcoefficient;
+
+/* ----------------
+ *        Form_pg_mvcoefficient corresponds to a pointer to a tuple with the
+ *        format of pg_mvcoefficient relation.
+ * ----------------
+ */
+typedef FormData_pg_mvcoefficient *Form_pg_mvcoefficient;
+
+/* ----------------
+ *        compiler constants for pg_mvcoefficient
+ * ----------------
+< */
+#define Natts_pg_mvcoefficient                5
+#define Anum_pg_mvcoefficient_mvcreloid        1
+#define Anum_pg_mvcoefficient_mvcattr1            2
+#define Anum_pg_mvcoefficient_mvcattr2            3
+#define Anum_pg_mvcoefficient_mvcattr3            4
+#define Anum_pg_mvcoefficient_mvccoefficient    5
+
+/* ----------------
+ *        pg_mvcoefficient has no initial contents
+ * ----------------
+ */
+
+/*
+ * prototypes for functions in pg_enum.c
+ */
+#endif   /* PG_MVCOEFFICIENT_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 6634099..db8454c 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier    INDEXRELID,    LANGNAME,    LANGOID,
+    MVDISTINCT,    NAMESPACENAME,    NAMESPACEOID,    OPERNAMENSP,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..7c77796 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|tpg_language|tpg_largeobject|tpg_largeobject_metadata|t
+pg_mvcoefficient|tpg_namespace|tpg_opclass|tpg_operator|t
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: multivariate statistics / patch v6