Re: PATCH: adaptive ndistinct estimator v4

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: PATCH: adaptive ndistinct estimator v4
Дата
Msg-id CA+TgmoYkrRFrJ6xagrY-w=HUZKa4oFJQiNw05reW5jWxt7WXJw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: adaptive ndistinct estimator v4  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Apr 30, 2015 at 9:20 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I agree that this is not ready for 9.5 - it was meant as an experiment
> (hence printing the estimate in a WARNING, to make it easier to compare
> the value to the current estimator). Without that it'd be much more
> complicated to compare the old/new estimates, but you're right this is
> not suitable for commit.
>
> So far it received only reviews from Jeff Janes (thanks!), but I think a
> change like this really requires a more thorough review, including the math
> part. I don't expect that to happen at the very end of the last CF before
> the freeze.

OK.

>> IMHO, the comments in this patch are pretty inscrutable.  I believe
>> this is because they presume more knowledge of what the patch is doing
>> than I myself possess.  For example:
>>
>> + * The AEE estimator is based on solving this equality (for "m")
>> + *
>> + *     m - f1 - f2 = f1 * (A + A(m)) / (B + B(m))
>> + *
>> + * where A, B are effectively constants (not depending on m), and A(m)
>> + * and B(m) are functions. This is equal to solving
>> + *
>> + *     0 = f1 * (A + A(m)) / (B + B(m)) - (m - f1 - f2)
>>
>> Perhaps I am just a dummy, but I have no idea what any of that means.
>> I think that needs to be fixed so that someone who knows what
>> n_distinct is but knows nothing about the details of these estimators
>> can get an idea of how they are doing their thing without too much
>> effort.  I think a lot of the comments share this problem.
>
> Well, I don't think you're dummy, but this requires reading the paper
> describing the estimator. Explaining that fully would essentially mean
> copying a large portion of the paper in the comment, and I suppose that's
> not a good idea. The explanation might be perhaps a bit more detailed,
> though - not sure what's the right balance.

Well, I think the problem in this case is that the comment describes
what the values are mathematically without explaining what they are
conceptually.  For example, in s=1/2at^2+v_0t+s_0, we could say that a
is meant to be the rate of change of an unseen variable v, while v_0
is the initial vale of v, and that s_0 is meant to be the starting
value of s, changing at a rate described by v.  That's basically the
kind of explanation you have right now.  It's all correct, but what
does it really mean?  It's more helpful to say that we're trying to
project the position of a body at a given time (t) given its initial
position (s_0), its initial velocity (v), and its rate of acceleration
(a).

>> 3. There should be some clear documentation in the comments indicating
>> why we believe that this is a whole lot better than what we do today.
>> Maybe this has been discussed adequately on the thread and maybe it
>> hasn't, but whoever goes to look at the committed code should not have
>> to go root through hackers threads to understand why we replaced the
>> existing estimator.  It should be right there in the code.  If,
>> hypothetically speaking, I were to commit this, and if, again strictly
>> hypothetically, another distinguished committer were to write back a
>> year or two later, "clearly Robert was an idiot to commit this because
>> it's no better than what we had before" then I want to be able to say
>> "clearly, you have not what got committed very carefully, because the
>> comment for function <blat> clearly explains that this new technology
>> is teh awesome".
>
> I certainly can add such comment to the patch ;-) Choose a function.

Well, at least the way things are organized right now,
adaptive_estimator seems like the place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Missing importing option of postgres_fdw
Следующее
От: Robert Haas
Дата:
Сообщение: Re: procost for to_tsvector