Re: [HACKERS] multivariate statistics (v19)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] multivariate statistics (v19)
Дата
Msg-id 20170126.200107.11536715.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] multivariate statistics (v19)  ("Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com>)
Ответы Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hello, I'll return on this since this should welcome more eyeballs.

At Thu, 26 Jan 2017 09:03:10 +0000, "Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com> wrote in
<4E72940DA2BF16479384A86D54D0988A565822A9@G01JPEXMBKW04>
> Hi
> 
> When you have time, could you rebase the pathes? 
> Some patches cannot be applied to the current HEAD.

For those who are willing to look this,
352a24a1f9d6f7d4abb1175bfd22acc358f43140 breaks this. So just
before it can accept this patches cleanly.

> 0001 patch can be applied but the following 0002 patch cannot be.
> 
> I've just started reading your patch (mainly docs and README, not yet source code.)
> 
> Though these are minor things, I've found some typos or mistakes in the document and README.
> 
> >+   statistics on the table. The statistics will be created in the in the
> >+   current database. The statistics will be owned by the user issuing
> 
> Regarding line 629 at 0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patch,
> there is a double "in the".
> 
> >+   knowledge of a value in the first column is sufficient for detemining the
> >+   value in the other column. Then functional dependencies are built on those
> 
> Regarding line 701 at 0002-PATCH,
> "determining" is mistakenly spelled "detemining".
> 
> 
> >@@ -0,0 +1,98 @@
> >+Multivariate statististics
> >+==========================
> 
> Regarding line 2415 at 0002-PATCH, "statististics" should be statistics
> 
> 
> >+ <refnamediv>
> >+  <refname>CREATE STATISTICS</refname>
> >+  <refpurpose>define a new statistics</refpurpose>
> >+ </refnamediv>
> 
> >+ <refnamediv>
> >+  <refname>DROP STATISTICS</refname>
> >+  <refpurpose>remove a statistics</refpurpose>
> >+ </refnamediv>
> 
> Regarding line 612 and 771 at 0002-PATCH,
> I assume saying "multiple statistics" explicitly is easier to understand to users
> since these commands don't for the statistics we already have in the pg_statistics in my understanding.
> 
> >+   [1] http://en.wikipedia.org/wiki/Database_normalization
> 
> Regarding line 386 at 0003-PATCH, is it better to change this link to this one:
> https://en.wikipedia.org/wiki/Functional_dependency ?
> README.dependencies cites directly above link.
> 
> Though I pointed out these typoes and so on, 
> I believe these feedback are less priority compared to the source code itself.
> 
> So please work on my feedback if you have time.


README.dependencies
 > dependencies, and for each one count the number of rows rows consistent it. "of rows rows consistent it" => "or rows
consistentwith it"?
 
 > are in fact consistent with the functinal dependency, i.e. that given the a
 "that given the a" => "that given a" ?


dependencies.c:
dependency_dgree():
 - The k is assumed larger than 1. I think assertion is required.
 - "/* end of the preceding group */" seems to be better if it   is just after the "if (multi_sort.." currently just
afterit.
 
 - The following comment seems mis-edited.   > * If there is a single are no contradicting rows, count the group   > *
assupporting, otherwise contradicting.    maybe this would be like the following? The varialbe counting   the first
"contradiction"is named "n_violations". This seems   somewhat confusing.    > * If there are no violating rows up to
here,count the group   > * as supporting, otherwise contradicting.   - "/* first columns match, but the last one does
not"   else if (multi_sort_compare_dims((k - 1), (k - 1), ...
 
    The above comparison should use multi_sort_compare_dim, not    dims
  - This function counts "n_contradicting_rows" but it is not    referenced. Anyway n_contradicting_rows = numrows -
n_supporing_rowsso it and n_contradicting seem    unncecessary.
 
build_mv_dependencies():
  - In the commnet,    "* covering jut 2 columns, to the largest ones, covering all columns"    "* included int the
statistics.We start from the smallest ones because we"
 
   l1: "jut" => "just", l2: "int" => "in"
mvstats.h:
  - struct MVDependencyData/ MVDependenciesData
    The varialbe length member at the last of the structs should    be defined using FLEXIBLE_ARRAY_MEMBER, from the
convention.
  - I'm not sure how much it impacts performance, but some    struct members seems to have a bit too wide types. For
example,MVDepedenciesData.type is of int32 but it can have    only '1' for now and it won't be two-digits. Also ndeps
cannot be so large.
 

common.c:
 multi_sort_compare_dims needs comment.

general: This patch uses int16 as the type of attrubute number but it might be better to use AttrNumber for the
purpose.(Specifically it seems defined as the type for an attribute  index but also used as the varialbe for number of
attributes)


Sorry for the random comment in advance. I'll learn this further.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Emre Hasegeli
Дата:
Сообщение: Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
Следующее
От: Stas Kelvich
Дата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions