Re: [HACKERS] Index/Function organized table layout (from Re:

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: [HACKERS] Index/Function organized table layout (from Re:
Дата
Msg-id 1065258004.2746.30.camel@fuji.krosing.net
обсуждение исходный текст
Ответ на Re: count(*) slow on large tables  (Christopher Browne <cbbrowne@libertyrms.info>)
Ответы COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)
Uses for Index/Function organizing
Список pgsql-performance
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.

Could this be made a TODO item, perhaps with your attack plan.
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes.

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.

----------------
Hannu


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Tuning/performance issue...
Следующее
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: count(*) slow on large tables