Re: Commit fest status

Поиск

Список

Период

Сортировка

От	Decibel!
Тема	Re: Commit fest status
Дата	15 апреля 2008 г. 19:44:15
Msg-id	007B5B4F-DEFB-4743-A04A-F6C3CA6FEB13@decibel.org обсуждение исходный текст
Ответ на	Re: Commit fest status (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On Apr 11, 2008, at 2:33 PM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Tom Lane wrote:
>>> In short, I think it's time to declare our first commit fest done.
>
>> OK, todo updated, but what about the "Maintaining cluster order on
>> insert" idea?
>>     http://momjian.us/cgi-bin/pgpatches
>
> The last item I see in the thread is some performance tests that
> make it look not worthwhile.  There's no discussion needed, unless
> someone refutes that test or improves the code.

What about Heikki's question to you about insert variations in your  
test in http://archives.postgresql.org/pgsql-patches/2007-07/ 
msg00123.php

Even looking at Heikki's test results I'm still questioning the  
validity of the test itself. I don't see any notable difference in  
performance in the SELECTs in Heikki's two tests, which makes me  
think that the data was being cached somewhere. If my math is  
correct, this test should be generating a table that's about 400MB,  
so unless you were running this on a 486 or something, it's going to  
be cached. I wouldn't expect this test to buy *anything* in the case  
of all the data being cached. In fact, it's not going to help at all  
if the pages we need to pull for the partial SELECTs are in memory,  
which means that for this test to me useful you either need a very  
large dataset, or you have to do something to flush the cache before  
the SELECT test. If even 50% of the table fits in memory, you could  
still very possibly find all the pages you needed already in memory,  
which spoils things.

Another issue is I think we need to consider the case of the  
usefulness of clustering (unless everyone agrees that it's a very  
useful tool that we need), and then consider the performance impact  
of this patch on inserts and ways to reduce that.

Towards the former, I've run some tests on some non-spectacular  
hardware. I created a table similar to Tom's and populated it via:

create table test (i int, d text);
insert into test SELECT 1000000*random(), repeat('x',350) FROM  
generate_series(1,1000000);
create index test_i on test(i);

I then ran test.sh
bin/pg_ctl -D data stop
clearmem 625
bin/pg_ctl -D data start
sleep 15
bin/psql -f test.sql

test.sql:
set enable_bitmapscan To off;
explain analyze select * from test where i between 2000 and 3000;
explain analyze select * from test;

clearmem is something that just allocates a bunch of memory to clear  
the cache. Unfortunately I wasn't able to completely clear the cache,  
but it was enough to show the benefit of clustering.

I ran that script several times with the table not clustered; the  
results were in the 18-20 second range for the between query. For  
grins I also tried with bitmapscan on, but results were inconclusive.  
I then clustered the table and re-ran the test; response times were  
sub-second. Granted, this is on pedestrian hardware, so a good SAN  
might not show as big a difference. I can try testing this at work if  
there's desire.

So clustering certainly offers a benefit. Is there some way we can  
improve the patch to reduce the impact to INSERT?
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Zdenek Kotala
Дата: 15 апреля 2008 г., 19:38:29
Сообщение: Re: WIP: Pg_upgrade - page layout converter (PLC) hook

Следующее

От: Andrew Chernow
Дата: 15 апреля 2008 г., 19:50:59
Сообщение: Re: pulling libpqtypes from queue

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Commit fest status

Предыдущее

Следующее