Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

Список
Период
Сортировка
От Simon Riggs
Тема Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Дата
Msg-id 1114454941.21529.245.camel@localhost.localdomain
обсуждение исходный текст
Ответ на Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane)
Ответы Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus)
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane)
Список pgsql-hackers
Дерево обсуждения
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
       Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
        Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
          Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
           Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
        Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
        Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
      Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )
On Mon, 2005-04-25 at 11:23 -0400, Tom Lane wrote:
> Simon Riggs <> writes:
> > My suggested hack for PostgreSQL is to have an option to *not* sample,
> > just to scan the whole table and find n_distinct accurately.
> > ...
> > What price a single scan of a table, however large, when incorrect
> > statistics could force scans and sorts to occur when they aren't
> > actually needed ?
>
> It's not just the scan --- you also have to sort, or something like
> that, if you want to count distinct values.  I doubt anyone is really
> going to consider this a feasible answer for large tables.

Assuming you don't use the HashAgg plan, which seems very appropriate
for the task? (...but I understand the plan otherwise).

If that was the issue, then why not keep scanning until you've used up
maintenance_work_mem with hash buckets, then stop and report the result.

The problem is if you don't do the sort once for statistics collection
you might accidentally choose plans that force sorts on that table. I'd
rather do it once...

The other alternative is to allow an ALTER TABLE command to set
statistics manually, but I think I can guess what you'll say to that!

Best Regards, Simon Riggs


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Dave Held"
Дата:
Сообщение: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Следующее
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: [PATCHES] Continue transactions after errors in psql