Re: [PATCHES] Including Snapshot Info with Indexes

Поиск

Список

Период

Сортировка

От	Gokulakannan Somasundaram
Тема	Re: [PATCHES] Including Snapshot Info with Indexes
Дата	26 октября 2007 г. 12:39:31
Msg-id	9362e74e0710260538g285a1a22s89e161ee5f8a2511@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCHES] Including Snapshot Info with Indexes (Heikki Linnakangas <heikki@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

On 10/26/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Gokulakannan Somasundaram wrote:
> On 10/26/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Gokulakannan Somasundaram wrote:
>>> As far as Load Test is concerned, i have tried to provide all the
>> relevant
>>> details. Please inform me, if i have left any.
>> Thanks!
>>
>> How large were the tables?
>
> It is in the Performance test report. They contain 2 million records. 6
> columns wide, 3 text and 3 numeric. same set of tables used for both tests,
> after refresh from a file

I meant in megabytes. How wide is the data in the text and numeric fields?

I have observed the size of PGDATA\base folder for size details

Size of Tables : 367 MB
Size of Tables + thin indexes : 616 MB
Size of Tables + thick indexes : 720 MB

The numbers were simply running between 1 and 2 million in a serial fashion. I think i made a mistake here. this would have helped thin indexes in range scans, since the data is clustered at the table, the bitmap heap scan would have been more effective. So i hope thick indexes will be more effective, if uncluster the data, since the thin index has to goto more table buffers. The test columns are approx 10 characters in length.

> Did you run all the queries concurrently? At this point, I think it'd be
>> better to run them separately so that you can look at the impact on each
>> kind of operation in isolation.
>>
> Performance tests are run against a workload and i have taken the workload
> of a small scale partitioning setup. Running the queries individually has
> already been done and the count of logical reads have been verified. I have
> already suggested that. For some reason, i am not able to convince that for
> simple index scans, Logical reads are a good measure of performance.

I wouldn't expect any performance gain for simple, not index-only,
scans. They have to hit the heap anyway.

I just feel the above test didn't do much I/Os and yet the index only scans are faster with thick indexes. since the size of RAM is 1GB and the size of the data is only 616MB, i hope most of them might have been OS cached. May be i am missing something here.

> What does the numbers look like if the the tables are small enough to
>> fit in RAM?
>
> I don't know whether this is a valid production setup, against which we need
> to benchmark. But if you insist, i will do that and get back to you next
> time.

A lot of people run databases that fit in RAM. And a lot of people
don't. Both cases are interesting. I'm particularly curious about that
because you've argued that the number of logical reads is important,
even if they don't become physical reads. Hannu also suggested that
swapping pages in/out of shared_buffers is relatively expensive; if
that's the case, we should see index-only scans performing much better
regular index scans, even when there's no physical I/O.

So the above test has fit into the RAM. Now do we need a test with tables that won't fit into RAM. i feel if the thick indexes were effective with data that would fit into RAM, then it will definitely be more effective with data that wouldn't fit into RAM. There is one performance bug, with updates where the caching strategy for BTStack didn't go effective for the Varlena structures. i will fix that bug next time. Also calls to HOT related stuff can be avoided, if it happens to be a thick index, I think these two changes, if made would further improve the performance of thick indexes.

> You should do some tuning, the PostgreSQL default configuration is not
>> tuned for maximum performance. At least increase checkpoint_segments and
>> checkpoint_timeout and shared_buffers. Though I noticed that you're
>> running on Windows; I don't think anyone's done any serious performance
>> testing or tuning on Windows yet, so I'm not sure how you should tune
>> that.
>
> What we are trying to do here, is to try and compare the performance of two
> indexing structures. AFAIK, the performance test done to compare two
> software implementations should not have parameter settings, favorable to
> one. I have not done any settings change favorable to thick index.

The tuning I suggested is just basic tuning any knowledgeable Postgres
DBA will do. It's not particularly in favor of any indexing scheme. With
the default checkpoint settings, for example, the system is going to be
busy doing checkpoints all the time if you have a reasonable rate of
updates.

The inserts and updates were at the rate of 10 every 2 seconds (there in the performance report) and the update was affecting two rows. I i haven't got any warning to increase the checkpoint during the test.
But my doubt is if checkpoint has caused so much of overhead, as we think of, how can the performance of thick indexes exceed thin indexes in index only scans?
As you might have observed all the statistics (Even the 90 and 95th percentile/median) were in milliseconds. So that might give a hint about the stress on the system.

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Sebastien FLAESCH
Дата: 26 октября 2007 г., 12:36:55
Сообщение: Re: PostgreSQL 8.3, libpq and WHERE CURRENT OF

Следующее

От: "Jonah H. Harris"
Дата: 26 октября 2007 г., 13:01:35
Сообщение: Re: PANIC caused by open_sync on Linux

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCHES] Including Snapshot Info with Indexes

Предыдущее

Следующее