Обсуждение: Re: Further open item (Was: Status of 7.2)
> > There is not much point in arguing a specific query case, > It is no specific query case. It is the speed of an index scan which > goes like N if you do it with PostgreSQL and it goes like log N if > you do not have to look back into the table like MS SQL server does. I cannot see why you keep saying that. It is simply not true. MS SQL shows a behavior of O(N), it is simply, that PostgreSQL because of well described methodology takes longer per affected row. The speed difference is linear, no matter how many rows are affected. Andreas
On Mon, 19 Nov 2001, Zeugswetter Andreas SB SD wrote: > > It is no specific query case. It is the speed of an index scan which > > goes like N if you do it with PostgreSQL and it goes like log N if > > you do not have to look back into the table like MS SQL server does. > > I cannot see why you keep saying that. It is simply not true. > MS SQL shows a behavior of O(N), it is simply, that PostgreSQL > because of well described methodology takes longer per affected row. > The speed difference is linear, no matter how many rows > are affected. I´m basing my assumption on the statement of my colleague. He told me that consequent index usage results in O(log N) behaviour. I´m really no expert in database theory but if you like I can foreward your question. Kind regards Andreas.
Tille, Andreas wrote: >On Mon, 19 Nov 2001, Zeugswetter Andreas SB SD wrote: > >>>It is no specific query case. It is the speed of an index scan which >>>goes like N if you do it with PostgreSQL and it goes like log N if >>>you do not have to look back into the table like MS SQL server does. >>> >>I cannot see why you keep saying that. It is simply not true. >>MS SQL shows a behavior of O(N), it is simply, that PostgreSQL >>because of well described methodology takes longer per affected row. >>The speed difference is linear, no matter how many rows >>are affected. >> >I´m basing my assumption on the statement of my colleague. He >told me that consequent index usage results in O(log N) behaviour. > Searching through index only vs. searching through index + looking up each tuple in main table can be better than linear, if the tuples are scattered throughout main table. Searching through index only is probably faster by roughly a factor of 2 * (size_of_heap_tuple/size_of_index_entry) in your case where you want to count about half of the rows in table. ---------------- Hannu
> >>The underlying problem the user is seeing is how to _know_ an index > >>tuple is valid without checking the heap, > >> > I'd propose a memory-only (or heavily cached) structure of tuple death transaction > ids for all transactions since the oldest live trx. And when that oldest finishes then > the tombstone marks for all tuples deleted between that and the new oldest are moved to > relevant indexes (or the index keys are deleted) by concurrent vacuum > or similar process. Andreas said, that his data is only loaded/changed in the night, thus for his queries all tuples found in the index are actually live. Every heap tuple lookup results in "tuple valid". In his case a per table global "highest xid" in heapdata that can be compared against highest xid during last vacuum would probably be sufficient (or a flag for "modified after last vacuum"). Of course per table globals are a major headache regarding concurrency, but there would be other possible optimizations that could profit from such a structure, like rowcount ... Andreas