Обсуждение: analyze.c

Поиск
Список
Период
Сортировка

analyze.c

От
Tiago Antão
Дата:
Hi!
 About analyze.c: If taken out vacuum, couldn't it be completly taken out of pg? Say,
to an external program? What's the big reason not to do that? I know that
there is some code in analyze.c (like comparing) that uses other parts of
pg, but that seems to be easily fixed.
 I'm leaning toward the implementation of end-biased histograms. There is
an introductory reference in the IEEE Data Engineering Bulletin, september
1995 (available on microsoft research site).

Best Regards,
Tiago




Re: analyze.c

От
Tom Lane
Дата:
Tiago Antão <tra@fct.unl.pt> writes:
>   About analyze.c:
>   If taken out vacuum, couldn't it be completly taken out of pg? Say,
> to an external program?

Not if you want to do anything useful with it --- direct access to the
database is only possible within the context of a backend, because of
all the locking, buffering, etc behavior that you must adhere to.

> What's the big reason not to do that? I know that
> there is some code in analyze.c (like comparing) that uses other parts of
> pg, but that seems to be easily fixed.

Are you proposing not to do any comparisons?  It will be interesting to
see how you can compute a histogram without any idea of equality or
ordering.  But if you want that, then you still need the function-call
manager as well as the type-specific comparison routines for every
datatype that you might be asked to operate on (don't forget
user-defined types here).

In short, I doubt you can build a useful analyze-engine that's
significantly smaller than a full backend.  Besides, having ANALYZE
available as a regular SQL command is just too useful to want to see
it moved out to some outside program that would have to be run
separately.

>   I'm leaning toward the implementation of end-biased histograms. There is
> an introductory reference in the IEEE Data Engineering Bulletin, september
> 1995 (available on microsoft research site).

Sounds interesting.  Can you give us an exact URL?
        regards, tom lane


Re: analyze.c

От
Tiago Antão
Дата:

On Wed, 23 Aug 2000, Tom Lane wrote:

> > What's the big reason not to do that? I know that
> > there is some code in analyze.c (like comparing) that uses other parts of
> > pg, but that seems to be easily fixed.
> 
> Are you proposing not to do any comparisons?  It will be interesting to
> see how you can compute a histogram without any idea of equality or
> ordering.  But if you want that, then you still need the function-call
> manager as well as the type-specific comparison routines for every
> datatype that you might be asked to operate on (don't forget
> user-defined types here).
  I forgot user defined data types :-(, but regarding histograms I think
the code can be made external (at least for testing purposes):  1. I was not suggesting not to do any comparisons, but
Ithink the only
 
comparison I need is equality, I don't need order as I don't need to
calculate mins or maxs (I just need mins and maxes on frequencies, NOT on 
dat itself) to make a histogram.  2. The mapping to text guarantees that I have (PQgetvalue returns
always char* and pg_statistics keeps a "text" anyway) a way of knowing
about equality regardless of type.
  But at least anything relating to order has to be in.

> >   I'm leaning toward the implementation of end-biased histograms. There is
> > an introductory reference in the IEEE Data Engineering Bulletin, september
> > 1995 (available on microsoft research site).
> 
> Sounds interesting.  Can you give us an exact URL?

http://www.research.microsoft.com/research/db/debull/default.htm

BTW, you can get access to SIGMOD CDs with lots of goodies for a very low
price (at least in 1999 it was a bargain), check out ACM membership for
sigmod.

I've been reading something about implementation of histograms, and,
AFAIK, in practice histograms is just a cool name for no more than:  1. top ten with frequency for each  2. the same
fortop ten worse  3. average for the rest
 

I'm writing code get this info (outside pg for now - for testing
purposes).

Best Regards,
Tiago
PS - again: I'm starting, so, some of my comments can be completly dumb.



Re: analyze.ct

От
Bruce Momjian
Дата:
> > >   I'm leaning toward the implementation of end-biased histograms. There is
> > > an introductory reference in the IEEE Data Engineering Bulletin, september
> > > 1995 (available on microsoft research site).
> > 
> > Sounds interesting.  Can you give us an exact URL?
> 
> http://www.research.microsoft.com/research/db/debull/default.htm
> 
> BTW, you can get access to SIGMOD CDs with lots of goodies for a very low
> price (at least in 1999 it was a bargain), check out ACM membership for
> sigmod.

Thanks.  I will look into that.  SIGMOD has some real valuable stuff.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: analyze.c

От
Bruce Momjian
Дата:
> Hi!
> 
>   About analyze.c:
>   If taken out vacuum, couldn't it be completly taken out of pg? Say,
> to an external program? What's the big reason not to do that? I know that
> there is some code in analyze.c (like comparing) that uses other parts of
> pg, but that seems to be easily fixed.
> 
>   I'm leaning toward the implementation of end-biased histograms. There is
> an introductory reference in the IEEE Data Engineering Bulletin, september
> 1995 (available on microsoft research site).

Why take it out of the backend?  Seems like a real pain, especially when
you realize what functions it would have to call. 

Also, keep in mind that the current analyze generates perfect estimates for
columns containing only two unique values, and columns containing only
unique values.  All other cases generate imperfect statistics.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: analyze.c

От
Bruce Momjian
Дата:
> BTW, you can get access to SIGMOD CDs with lots of goodies for a very low
> price (at least in 1999 it was a bargain), check out ACM membership for
> sigmod.
> 
> I've been reading something about implementation of histograms, and,
> AFAIK, in practice histograms is just a cool name for no more than:
>    1. top ten with frequency for each
>    2. the same for top ten worse
>    3. average for the rest

I wonder if just increasing the number of buckets in analyze.c would
help?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026