Обсуждение: "Big O" notation for postgres?

Поиск
Список
Период
Сортировка

"Big O" notation for postgres?

От
"H. Hall"
Дата:
Does anyone know if there is a source that provides "Big O" notation for
postgres's aggregate functions and operations?  For example is count(*)
= O(1) or O(n)?

Do the developers for postgres use Big O when selecting algorithms? If
so, is the info easily available?

Thanks,
HH




--
H. Hall
ReedyRiver Group LLC
site: reedyriver.com


Re: "Big O" notation for postgres?

От
"Jonah H. Harris"
Дата:
On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote:
> Does anyone know if there is a source that provides "Big O" notation for
> postgres's aggregate functions and operations?  For example is count(*) =
> O(1) or O(n)?

I don't know of any document containing the complexity of each
aggregate, but it's sometimes left as a comment in the souce code.

IIRC, COUNT (non-distinct) is currently O(n), where n also includes
evaluation of tuples not represented in the final count (due to
Postgres' MVCC design).

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

Re: "Big O" notation for postgres?

От
Richard Huxton
Дата:
Jonah H. Harris wrote:
> On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote:
>> Does anyone know if there is a source that provides "Big O" notation for
>> postgres's aggregate functions and operations?  For example is count(*) =
>> O(1) or O(n)?
>
> I don't know of any document containing the complexity of each
> aggregate, but it's sometimes left as a comment in the souce code.

Recent max() and min() can be O(n) or O(1) depending on the where-clause
and presence of an index too, just to muddy the waters.

--
   Richard Huxton
   Archonet Ltd

Re: "Big O" notation for postgres?

От
PFC
Дата:
On Wed, 21 May 2008 16:10:53 +0200, H. Hall <hhall1001@reedyriver.com>
wrote:

> Does anyone know if there is a source that provides "Big O" notation for
> postgres's aggregate functions and operations?  For example is count(*)
> = O(1) or O(n)?
>
> Do the developers for postgres use Big O when selecting algorithms? If
> so, is the info easily available?

    You can't do any better than O( n rows examined by the aggregate ) except
for max() and min() on an indexed expression, which in this case aren't
really aggrgates anymore since they are internally rewritten as an index
lookup to get the value you want... but stuff like sum() or avg() or
count() will always have to see all the rows selected (and some more)
unless you use clever hacks like materialized views etc, in which case the
thing in the O() will change, or at least the O() constant will change...

Re: "Big O" notation for postgres?

От
"H. Hall"
Дата:
PFC wrote:
> On Wed, 21 May 2008 16:10:53 +0200, H. Hall wrote:
>
>> Does anyone know if there is a source that provides "Big O" notation
>> for postgres's aggregate functions and operations?  For example is
>> count(*) = O(1) or O(n)?
>>
>> Do the developers for postgres use Big O when selecting algorithms?
>> If so, is the info easily available?
>
>     You can't do any better than O( n rows examined by the aggregate )
> except for max() and min() on an indexed expression, which in this
> case aren't really aggrgates anymore since they are internally
> rewritten as an index lookup to get the value you want... but stuff
> like sum() or avg() or count() will always have to see all the rows
> selected (and some more) unless you use clever hacks like materialized
> views etc, in which case the thing in the O() will change, or at least
> the O() constant will change...
>
Thank you PFC and also Jonah, and Richard for your replies.

It occurs to me that Big O might be a useful way to understand/explain
what is happening with situations like Albert's earlier today:

I've got a query similar to this:
> >
> > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> >
> > That took > 84 minutes (the query was a bit longer but this is the part
> > that made the difference) after a little change the query took ~1 second:
> >
> > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
> > t2.id;

BTW, anyone reading this and not familiar with Big O notation might want
to check out these links. All are intro type articles:

   * An informal introduction to O(N) notation:
           http://www.perlmonks.org/?node_id=227909
  * Analysis of Algorithms and Selection of Algorithms:

http://www.cs.utk.edu/~parker/Courses/CS302-Fall06/Notes/complexity.html
   * Complexity and Big-O Notation
         http://pages.cs.wisc.edu/~hasti/cs367-common/notes/COMPLEXITY.html

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com


Re: "Big O" notation for postgres?

От
Gregory Stark
Дата:
"Richard Huxton" <dev@archonet.com> writes:

> Jonah H. Harris wrote:
>> On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote:
>>> Does anyone know if there is a source that provides "Big O" notation for
>>> postgres's aggregate functions and operations?  For example is count(*) =
>>> O(1) or O(n)?
>>
>> I don't know of any document containing the complexity of each
>> aggregate, but it's sometimes left as a comment in the souce code.
>
> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
> presence of an index too, just to muddy the waters.

Hm, true. But excluding those special cases all Postgres aggregate functions
will be O(n) unless they're doing something very odd. None of the built-in
functions (except min/max as noted) do anything odd like that.

The reason way is because of the basic design of Postgres aggregate functions.
They are fed every tuple one by one and have to keep their state in a single
variable. Most of the aggregate functions like count(*) etc just keep a static
non-growing state and the state transition function is a simple arithmetic
operation which is O(1). So the resulting operation is O(n).

Actually one exception would be something like

CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');

Since the state variable has to keep accumulating more and more stuff the
array_append becomes more and more expensive (it has to generate a new array
so it has to copy the existing stuff). So actually it woul dbe O(n^2).

The only builtin aggregate which looks like it falls in this category would be
xmlagg()

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: "Big O" notation for postgres?

От
"H. Hall"
Дата:
Gregory Stark wrote:
> "Richard Huxton" <dev@archonet.com> writes:
>
>
>> Jonah H. Harris wrote:
>>
>>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:
>>>
>>>> Does anyone know if there is a source that provides "Big O" notation for
>>>> postgres's aggregate functions and operations?  For example is count(*) =
>>>> O(1) or O(n)?
>>>>
>>> I don't know of any document containing the complexity of each
>>> aggregate, but it's sometimes left as a comment in the souce code.
>>>
>> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
>> presence of an index too, just to muddy the waters.
>>
When I first read the above from Jonah I just assumed some Postgres
magic was producing O(1). After seeing this again, I believe that
Postgres must be doing something like the following for max and min:
Max:     ORDER BY colName  DESC LIMIT 1
Min:      ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index
as in the above then the Max and Min functions would both be O(log N) ,
this is  log base2 of N, which is the time it takes to search a balanced
binary tree, not O(1) which implies a constant time to perform,
regardless of the size of the dataset N.

>
> Hm, true. But excluding those special cases all Postgres aggregate functions
> will be O(n) unless they're doing something very odd. None of the built-in
> functions (except min/max as noted) do anything odd like that.
>
> The reason way is because of the basic design of Postgres aggregate functions.
> They are fed every tuple one by one and have to keep their state in a single
> variable. Most of the aggregate functions like count(*) etc just keep a static
> non-growing state and the state transition function is a simple arithmetic
> operation which is O(1). So the resulting operation is O(n).
>
> Actually one exception would be something like
>
> CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');
>
> Since the state variable has to keep accumulating more and more stuff the
> array_append becomes more and more expensive (it has to generate a new array
> so it has to copy the existing stuff). So actually it woul dbe O(n^2).
>
> The only builtin aggregate which looks like it falls in this category would be
> xmlagg()
>
>
Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its
estimates.  This would allow one to quickly tell how a query scales with
a large number of rows.

Thanks,
HH

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com