Обсуждение: Re: Further open item (Was: Status of 7.2)

Поиск
Список
Период
Сортировка

Re: Further open item (Was: Status of 7.2)

От
"Zeugswetter Andreas SB SD"
Дата:
 
> > There is not much point in arguing a specific query case,
> It is no specific query case.  It is the speed of an index scan which
> goes like N if you do it with PostgreSQL and it goes like log N if
> you do not have to look back into the table like MS SQL server does.

I cannot see why you keep saying that. It is simply not true.
MS SQL shows a behavior of O(N), it is simply, that PostgreSQL
because of well described methodology takes longer per affected row.
The speed difference is linear, no matter how many rows
are affected.

Andreas 


Re: Further open item (Was: Status of 7.2)

От
"Tille, Andreas"
Дата:
On Mon, 19 Nov 2001, Zeugswetter Andreas SB SD wrote:

> > It is no specific query case.  It is the speed of an index scan which
> > goes like N if you do it with PostgreSQL and it goes like log N if
> > you do not have to look back into the table like MS SQL server does.
>
> I cannot see why you keep saying that. It is simply not true.
> MS SQL shows a behavior of O(N), it is simply, that PostgreSQL
> because of well described methodology takes longer per affected row.
> The speed difference is linear, no matter how many rows
> are affected.
I´m basing my assumption on the statement of my colleague.  He
told me that consequent index usage results in O(log N) behaviour.
I´m really no expert in database theory but if you like I can foreward
your question.

Kind regards
        Andreas.


Re: Further open item (Was: Status of 7.2)

От
Hannu Krosing
Дата:

Tille, Andreas wrote:

>On Mon, 19 Nov 2001, Zeugswetter Andreas SB SD wrote:
>
>>>It is no specific query case.  It is the speed of an index scan which
>>>goes like N if you do it with PostgreSQL and it goes like log N if
>>>you do not have to look back into the table like MS SQL server does.
>>>
>>I cannot see why you keep saying that. It is simply not true.
>>MS SQL shows a behavior of O(N), it is simply, that PostgreSQL
>>because of well described methodology takes longer per affected row.
>>The speed difference is linear, no matter how many rows
>>are affected.
>>
>I´m basing my assumption on the statement of my colleague.  He
>told me that consequent index usage results in O(log N) behaviour.
>
Searching through index only vs. searching through index + looking up 
each tuple in main
table can be better than linear, if the tuples are scattered throughout 
main table.

Searching through index only is probably faster by roughly a factor of  
2 * (size_of_heap_tuple/size_of_index_entry) in your case where you want 
to count
about half of the rows in table.



----------------
Hannu




Re: Further open item (Was: Status of 7.2)

От
"Zeugswetter Andreas SB SD"
Дата:
> >>The underlying problem the user is seeing is how to _know_ an index
> >>tuple is valid without checking the heap,
> >>
> I'd propose a memory-only (or heavily cached) structure of tuple death
transaction
> ids for all transactions since the oldest live trx. And when that
oldest finishes then
> the tombstone marks for all tuples deleted between that and the new
oldest are moved to 
> relevant indexes (or the index keys are deleted) by concurrent vacuum
> or similar process.

Andreas said, that his data is only loaded/changed in the night, thus
for his queries all tuples found in the index are actually live.
Every heap tuple lookup results in "tuple valid".

In his case a per table global "highest xid" in heapdata that can be
compared
against highest xid during last vacuum would probably be sufficient
(or a flag for "modified after last vacuum").
Of course per table globals are a major headache regarding concurrency,
but there would be other possible optimizations that could profit from
such 
a structure, like rowcount ...

Andreas