Re: Hash partitioning.

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Hash partitioning.
Дата
Msg-id 51CAF638.50009@vmware.com
обсуждение исходный текст
Ответ на Re: Hash partitioning.  ("Yuri Levinsky" <yuril@celltick.com>)
Ответы Re: Hash partitioning.  (Bruce Momjian <bruce@momjian.us>)
Re: Hash partitioning.  ("Yuri Levinsky" <yuril@celltick.com>)
Re: Hash partitioning.  (Nicolas Barbier <nicolas.barbier@gmail.com>)
Список pgsql-hackers
On 26.06.2013 16:41, Yuri Levinsky wrote:
> Heikki,
> As far as I understand the height of the btree will affect the number of
> I/Os necessary. The height of the tree does not increase linearly with
> the number of records.

The height of a b-tree is O(log n), where n is the number of records. 
Informally, if we assume that you have on average, say, 1000 keys on one 
b-tree page, a two-level b-tree can hold one million items, and a three 
level one billion items, and so on. The height of the tree affects the 
number of I/Os needed for searches, but keep in mind that the top levels 
of the tree are going to be very frequently accessed and in practice 
will stay permanently cached. You will only perform actual I/O on the 
1-2 bottom levels of the tree (or 0 if it all fits in cache)

Now let's compare that with a hash partitioned table, with 1000 
partitions and a b-tree index on every partition. When doing a search, 
you first hash the key to look up the right partition, then you search 
the index of that partition. This is almost equivalent to just having a 
b-tree that's one level taller - instead of looking up the right 
partition in the hash table, you look up the right child page at the 
root of the b-tree. From a very coarse theoretical point of view, the 
only difference is that you replaced the binary search on the b-tree 
root page with an equivalent hash lookup. A hash lookup can be somewhat 
cheaper than binary search, but in practice there is very little 
difference. There certainly isn't any difference in the number of actual 
I/O performed.

In practice, there might be a lot of quirks and inefficiencies and 
locking contention etc. involved in various DBMS's, that you might be 
able to work around with hash partitioning. But from a theoretical point 
of view, there is no reason to expect just partitioning a table on a 
hash to make key-value lookups any faster.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jamie Martin
Дата:
Сообщение: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Следующее
От: "Yuri Levinsky"
Дата:
Сообщение: Re: Hash partitioning.