Re: Select max(foo) and select count(*) optimization

Поиск
Список
Период
Сортировка
От Christopher Browne
Тема Re: Select max(foo) and select count(*) optimization
Дата
Msg-id m31xqef8go.fsf@wolfe.cbbrowne.com
обсуждение исходный текст
Ответ на Select max(foo) and select count(*) optimization  (John Siracusa <siracusa@mindspring.com>)
Ответы Re: Select max(foo) and select count(*) optimization  (Paul Tuckfield <paul@tuckfield.com>)
Список pgsql-performance
Oops! siracusa@mindspring.com (John Siracusa) was seen spray-painting on a wall:
> Speaking of special cases (well, I was on the admin list) there are two
> kinds that would really benefit from some attention.
>
> 1. The query "select max(foo) from bar" where the column foo has an
> index.  Aren't indexes ordered?  If not, an "ordered index" would be
> useful in this situation so that this query, rather than doing a
> sequential scan of the whole table, would just "ask the index" for
> the max value and return nearly instantly.
>
> 2. The query "select count(*) from bar" Surely the total number of
> rows in a table is kept somewhere convenient.  If not, it would be
> nice if it could be :) Again, rather than doing a sequential scan of
> the entire table, this type of query could return instantly.
>
> I believe MySQL does both of these optimizations (which are probably
> a lot easier in that product, given its data storage system).  These
> were the first areas where I noticed a big performance difference
> between MySQL and Postgres.
>
> Especially with very large tables, hearing the disks grind as
> Postgres scans every single row in order to determine the number of
> rows in a table or the max value of a column (even a primary key
> created from a sequence) is pretty painful.  If the implementation
> is not too horrendous, this is an area where an orders-of-magnitude
> performance increase can be had.

These are both VERY frequently asked questions.

In the case of question #1, the optimization you suggest could be
accomplished via some Small Matter Of Programming.  None of the people
that have wanted the optimization have, however, offered to actually
DO the programming.

In the case of #2, the answer is "surely NOT."  In MVCC databases,
that information CANNOT be stored anywhere convenient because queries
requested by transactions started at different points in time must get
different answers.

I think we need to add these questions and their answers to the FAQ so
that the answer can be "See FAQ Item #17" rather than people having to
gratuitously explain it over and over and over again.
--
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/finances.html
Rules of  the Evil Overlord #127.  "Prison guards will  have their own
cantina featuring  a wide  variety of tasty  treats that  will deliver
snacks to the  guards while on duty. The guards  will also be informed
that  accepting food or  drink from  any other  source will  result in
execution." <http://www.eviloverlord.com/>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Christopher Browne
Дата:
Сообщение: Re: Use my (date) index, darn it!
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: deferred foreign keys