Re: [PATCHES] Including Snapshot Info with Indexes

Поиск
Список
Период
Сортировка
От Gokulakannan Somasundaram
Тема Re: [PATCHES] Including Snapshot Info with Indexes
Дата
Msg-id 9362e74e0710230435r4553454eg4258e0fea2a03198@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCHES] Including Snapshot Info with Indexes  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Ответы Re: [PATCHES] Including Snapshot Info with Indexes  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Список pgsql-hackers


On 10/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Please keep the list cc'd.

Gokulakannan Somasundaram wrote:
> On 10/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Gokulakannan Somasundaram wrote:
>> I have also enabled the display of Logical Reads. In order to see that,
>> set
>>> log_statement_stats on.
>> You should start benchmarking, to verify that you're really getting the
>> kind of speed up you're looking for, before you spend any more effort on
>> that. Reduction in logical reads alone isn't enough. Remember that for a
>> big change like that, the gain has to be big as well.
>
> I have done the benchmark. I have done the benchmark with Logical reads, as
> they turn out to be potential physical reads. Try turning on the
> log_statement_stats in postgresql.conf. try firing some queries, which can
> satisfied by the index. You would see the difference.

I would see a decrease in the number of logical reads, that's all. You
need to demonstrate a real increase in throughput and/or reduction in
response times.

Note that even though you reduce the number of logical reads, with a
thick index a logical read is *more* likely to be a physical read,
because the index is larger and therefore consumes more cache.

Say, with a normal index, you need to goto the table for checking the snapshot. So you would be loading both the index pages + table pages, in order to satisfy a certain operations. Whereas in thick index you occupy 16 bytes per tuple more in order to avoid going to the table. So memory management is again better. But i can run the load test, if that's required.  Even when all the tuples are in memory, index only scans are almost 40-60% faster than the index scans with thin indexes.

> As a first test, I'd like to see results from SELECTs on different sized
>> tables. On tables that fit in cache, and on tables that don't. Tables
>> large enough that the index doesn't fit in cache. And as a special case,
>> on a table just the right size that a normal index fits in cache, but a
>> thick one doesn't.
>
> I have not done a Load test. That's a good idea. Are you guys using Apache
> JMeter?

You can use whatever you want, as long as you can get the relevant
numbers out of it. contrib/pgbench is a good place to start.

DBT-2 is another test people often use for patches like this. It's quite
tedious to set up and operate, but it'll give you nice very graphs.

Make sure you control vacuums, checkpoints etc., so that you get
repeatable results.


Sure i will do that. Thanks for the advice.

> Also i think you might have noted that the thick indexes are not affected by
> updates, if the updated column is not in the index. I think that add on to
> one more advantage of thick indexes against DSM.

That cannot possibly work. Imagine that you have a table

ctid | id | data
-----+----+-----
(0,1)| 1  | foo
(0,2)| 1  | bar

where (0,2) is an updated version of (0,1). If you don't update the
index, there will be no index pointer to (0,2), so a regular index scan,
not an index-only scan, will not find the updated tuple.

Or did you mean that the index is not updated on HOT updates? That's an
interesting observation. We could do index-only scans with the DSM as
well, even if there's HOT updates, if we define the bit in the bitmap to
mean "all tuples in this page are visible to everyone, or there's only
HOT updates". That works, because an index-only-scan doesn't access any
of the updated columns. It probably isn't worth it, though. Seems like a
pretty narrow use case, and makes it more complicated.


I think i was not understood. An update transaction is not degraded by thick index. Update = Delete + insert. If you don't update the columns in index, then we would goto the same index page for both delete and insert. i have done a small optimization there to cache the BTStack. you do not need to do any more I/O. So effectively update performance in thick index = update performance in thin index (if indexed columns are not updated).
Hope i am clear..

What do you thick about not maintaining pins in case of thick indexes?

Thanks,
Gokul,
CertoSQL Project,
Allied Solution Groups.
( www.alliedgroups.com)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rafael Martinez
Дата:
Сообщение: Re: PostgreSQL performance issues
Следующее
От: "Heikki Linnakangas"
Дата:
Сообщение: Re: [PATCHES] Including Snapshot Info with Indexes