Re: Hash index todo list item

Поиск
Список
Период
Сортировка
От Kenneth Marshall
Тема Re: Hash index todo list item
Дата
Msg-id 20071012235042.GQ12467@it.is.rice.edu
обсуждение исходный текст
Ответ на Re: Hash index todo list item  (Kenneth Marshall <ktm@rice.edu>)
Список pgsql-hackers
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> > 
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> > to improve hash index performance. Below are the pieces of
> > this project that I am currently considering:
> > 
> > 1. Characterize the current hash index implementation against
> >    the BTree index, with a focus on space utilization and
> >    lookup performance against a collection of test data. This
> >    will give a baseline performance test to evaluate the impact
> >    of changes. I initially do not plan to bench the hash creation
> >    process since my initial focus will be on lookup performance.
> > 
> 
> Here are very basic results for a table with 1.6m entries:
> 
> postgres=# CREATE TABLE dict (word varchar(100));
> CREATE TABLE
> 
> postgres=# COPY dict FROM '/tmp/words';
> COPY 1648379
> postgres=# select count(*) from dict;
>   count  
> ---------
>  1648379
> (1 row)
> 
> Time: 11187.418 ms
> postgres=# select count(*) from dict;
>   count  
> ---------
>  1648379
> (1 row)
> 
> Time: 6040.912 ms
> postgres=# CREATE INDEX wordhash ON dict USING hash (word);
> CREATE INDEX
> Time: 11108707.160 ms
> 
> postgres=# select * from dict where word = 'avatar';
>   word  
> --------
>  avatar
> (1 row)
> 
> Time: 79.823 ms
> postgres=# select * from dict where word = 'zebra';
>  word  
> -------
>  zebra
> (1 row)
> 
> Time: 9.864 ms
> postgres=# select * from dict where word = 'turkey'; 
>   word  
> --------
>  turkey
> (1 row)
> 
> Time: 18.418 ms
> Time: 1.045 ms
> Time: 1.257 ms
> Time: 1.080 ms
> 
> postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
> CREATE INDEX
> 
> Time: 25438.884 ms
> 
> postgres=# select * from dict where word = 'avatar';
>   word  
> --------
>  avatar
> (1 row)
> 
> Time: 13.400 ms
> postgres=# select * from dict where word = 'zebra';
>  word  
> -------
>  zebra
> (1 row)
> 
> Time: 1.173 ms
> postgres=# select * from dict where word = 'turkey';
>   word  
> --------
>  turkey
> (1 row)
> 
> Time: 1.186 ms
> Time: 1.103 ms
> Time: 1.099 ms
> Time: 1.108 ms
> 
> ------------------------------
> Size of table =       87556096
> 
> Size of hash index = 268451840
> Size of btree index = 53510144
> 
> From my very small sample on an unloaded machine, a hash index lookup
> took the least amount of time. It had a much larger initial time which
> could be attributable to cache population effects. The size is 5X that
> of the Btree index. I will continue to improve the test suite as more
> granularity is needed. If anyone has a good data generator, please let
> me know. Otherwise I will just roll my own.
> 
> Regards,
> Ken
> 
I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2)
including the hash index sorted build patch and the results are below:

test=# CREATE TABLE dict (word varchar(100));
CREATE TABLE
Time: 24.463 ms
test=# COPY dict FROM '/tmp/cracklib-words';
LOG:  checkpoints are occurring too frequently (21 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
COPY 1648379
Time: 44470.235 ms
test=# select count(*) from dict; count  
---------1648379
(1 row)

Time: 4553.924 ms
test=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 85905.960 ms
test=# select * from dict where word = 'avatar'; word  
--------avatar
(1 row)

Time: 592.906 ms
test=# select * from dict where word = 'zebra';word  
-------zebra
(1 row)

Time: 21.458 ms
test=# select * from dict where word = 'turkey'; word  
--------turkey
(1 row)

Time: 22.532 ms
Time: 1.224 ms
Time: 1.200 ms
Time: 1.264 ms
test=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX
Time: 33714.874 ms
test=# select * from dict where word = 'avatar'; word  
--------avatar
(1 row)

Time: 24.296 ms
test=# select * from dict where word = 'zebra';word  
-------zebra
(1 row)

Time: 1.400 ms
test=# select * from dict where word = 'turkey'; word  
--------turkey
(1 row)

Time: 28.302 ms
Time: 1.284 ms
Time: 1.313 ms

---------------------------------------
Size of table =         69877760

Size of hash index =   268451840
Size of btree index =   48570368

I just wanted to have this baseline for future patches since
8.3 is just around the bend. As expected, the sorted build
process is a huge improvement from the unsorted original.

Regards,
Ken Marshall


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Devrim GÜNDÜZ
Дата:
Сообщение: Re: Seems we need a post-beta1 initdb already
Следующее
От: "Gokulakannan Somasundaram"
Дата:
Сообщение: Re: Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)