Re: no index-usage on aggregate-functions?
От | Scott Marlowe |
---|---|
Тема | Re: no index-usage on aggregate-functions? |
Дата | |
Msg-id | 1088529718.12350.35.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | Re: no index-usage on aggregate-functions? ("Harald Lau (Sector-X)" <harald@sector-x.de>) |
Список | pgsql-performance |
On Tue, 2004-06-29 at 02:46, Harald Lau (Sector-X) wrote: > @Chris: > > > > SELECT count(*) FROM the_table > > > => Seq scan -> takes about 12 sec > > This cannot be made O(1) in postgres due to MVCC. You just have to live > > with it. > > bad news > BTW: in this case you could workaround > select reltuples from pg_class where relname='the_table' > (yes, I know: presumes a regular vacuum analyse) Note that there ARE other options. While the inability to provide a speedy count is a "cost" of using an MVCC system, the ability to allow thousands of readers to run while updates are happening underneath them more than makes up for the slower aggregate performance. The other options to this problem involve maintaining another table that has a single (visible) row that is maintained by a trigger on the main table that fires and updates that single row to reflect the count of the table. This is costly on updates, but may be worth doing for certain situations. Personally, I haven't had a great need to do a count(*) on my tables that much. And on large tables, approximations are usually fine. > > Average and sum can never use an index AFAIK, in any db server. You > > need information from every row. > > Take a look at the SQLSrv-pendant: > create index x_1 on the_table (num_found) > select avg(num_found) from the_table > -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) > > (I'm not sure what Oracle does - have to re-install it first ...) There's a good chance Oracle can use the index too. That's because both Oracle is still a row locked database at heart. It's MVCC system sits on top of it in roll back segments. So, the main store is serialized and can be indexed, while the updates live in the rollback segment. This, however, is not paradise. This limits Oracle's performance for things like long running transactions and makes it slower as the amount of information in the rollback segment grows. Meanwhile, PostgreSQL uses an in store MVCC mechanism. This system means that all index accesses must then hit the actual MVCC storage, since indexes aren't easily serialized. > @Scott: > > Yes, you're expecting an MVCC database to behave like a row locking > > database. > > hmmmm... > So, it seems that PG is not soooo well suited for a datawarehouse and/or performing extensive statistics/calculations/reportingson large tables, is it? On the contrary, it makes it GREAT for datawarehousing. Not because any one child process will be super fast, but because ALL the child processes will run reasonably fast, even under very heavy read and write load. Note that if you've got the memory for the hash agg algo to fire into shared memory, it's pretty darned fast now, so if the data (mostly) fit into kernel cache you're gold. And 12 gig Intel boxes aren't that expensive, compared to an Oracle license.
В списке pgsql-performance по дате отправления:
Предыдущее
От: Bill MontgomeryДата:
Сообщение: Re: High load average with PostgreSQL 7.4.2 on debian/ibm