Re: Hash partitioning.

Поиск
Список
Период
Сортировка
От Nicolas Barbier
Тема Re: Hash partitioning.
Дата
Msg-id CAP-rdTZogd6dFc2AuXJ9HV1J8R6pbQoNZKP-Ovn6PcBRrpk7jw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash partitioning.  (Markus Wanner <markus@bluegap.ch>)
Ответы Re: Hash partitioning.  (Markus Wanner <markus@bluegap.ch>)
Re: Hash partitioning.  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
2013/6/27 Markus Wanner <markus@bluegap.ch>:

> On 06/27/2013 11:12 AM, Nicolas Barbier wrote:
>
>> Imagine that there are a lot of indexes, e.g., 50. Although a lookup
>> (walking one index) is equally fast, an insertion must update al 50
>> indexes. When each index requires one extra I/O (because each index is
>> one level taller), that is 50 extra I/Os. In the partitioned case,
>> each index would require the normal smaller amount of I/Os. Choosing
>> which partition to use must only be done once: The result “counts” for
>> all indexes that are to be updated.
>
> I think you're underestimating the cost of partitioning. After all, the
> lookup of what index to update for a given partition is a a lookup in
> pg_index via pg_index_indrelid_index - a btree index.

I am assuming that this (comparatively very small and super-hot) index
is cached all the time, while for the other indexes (that are
supposedly super-huge) only the top part stays cached.

I am mostly just trying to find out where Yuri’s “partitioning is
needed for super-huge tables” experience might come from, and noting
that Heikki’s argument might not be 100% valid. I think that the
“PostgreSQL-answer” to this problem is to somehow cluster the data on
the “hotness column” (so that all hot rows are close to one another,
thereby improving the efficiency of the caching of relation blocks) +
partial indexes for the hot rows (as first mentioned by Claudio; to
improve the efficiency of the caching of index blocks).

> Additionally, the depth of an index doesn't directly translate to the
> number of I/O writes per insert (or delete). I'd rather expect the avg.
> number of I/O writes per insert into a b-tree to be reasonably close to
> one - depending mostly on the number of keys per page, not depth.

My reasoning was: To determine which index block to update (typically
one in both the partitioned and non-partitioned cases), one needs to
walk the index first, which supposedly causes one additional (read)
I/O in the non-partitioned case on average, because there is one extra
level and the lower part of the index is not cached (because of the
size of the index). I think that pokes a hole in Heikki’s argument of
“it really doesn’t matter, partitioning == using one big table with
big non-partial indexes.”

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Kudos for Reviewers -- straw poll
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Group Commits Vs WAL Writes