Re: PATCH: adaptive ndistinct estimator v4

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: PATCH: adaptive ndistinct estimator v4
Дата
Msg-id CA+TgmoZ6FgvwVTyzM7hLiHDifYinXKsTiRWxh062AETOt8Dw7Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: adaptive ndistinct estimator v4  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On Wed, May 13, 2015 at 5:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> With the warning it is very hard to correlate the discrepancy you do see
> with which column is causing it, as the warnings don't include table or
> column names (Assuming of course that you run it on a substantial
> database--if you just run it on a few toy cases then the warning works
> well).

Presumably the warning is going to go away before we actually commit this thing.

> If we want to have an explicitly experimental patch which we want people
> with interesting real-world databases to report back on, what kind of patch
> would it have to be to encourage that to happen?  Or are we never going to
> get such feedback no matter how friendly we make it?  Another problem is
> that you really need to have the gold standard to compare them to, and
> getting that is expensive (which is why we resort to sampling in the first
> place).  I don't think there is much to be done on that front other than
> bite the bullet and just do it--perhaps only for the tables which have
> discrepancies.

If we stick with the idea of a GUC to control the behavior, then
somebody can run ANALYZE, save the ndistinct estimates, run ANALYZE
again, and compare.  They can also run SQL queries against the tables
themselves to check the real value.  We could even provide a script
for all of that.  I think that would be quite handy.

> It can't hurt, but how effective will it be?  Will developers know or care
> whether ndistinct happened to get better or worse while they are working on
> other things?  I would think that problems will be found by focused testing,
> or during beta, and probably not by accidental discovery during the
> development cycle.  It can't hurt, but I don't know how much it will help.

Once we enter beta (or even feature freeze), it's too late to whack
around the algorithm heavily.  We're pretty much committed to
releasing and supporting whatever we have got at that point.  I guess
we could revert it if it doesn't work out, but that's about the only
option at that point.  We have more flexibility during the main part
of the development cycle.  But your point is certainly valid and I
don't mean to dispute it.

> I agree with the "experimental GUC".  That way if hackers do happen to see
> something suspicious, they can just turn it off and see what difference it
> makes.  If they have to reverse out a patch from 6 months ago in an area of
> the code they aren't particularly interested in and then recompile their
> code and then juggle two different sets of binaries, they will likely just
> shrug it off without investigation.

Yep.  Users, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: WALWriteLock contention
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Missing importing option of postgres_fdw