Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Дата
Msg-id 603c8f070812292055t32b850d9l5a9427eea91c2ed5@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets  ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>)
Ответы Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets  ("Bryce Cutt" <pandasuit@gmail.com>)
Список pgsql-hackers
> I think that setting aside a minimum percentage of work_mem may be a
> reasonable approach.  For instance, setting aside 1% at even 1 MB
> work_mem would be 10 KB which is enough to store about 40 MCV tuples of
> the TPC-H database.  Such a small percentage would be very unlikely (but
> still possible) to change the number of batches used.  Then, given the
> memory allocation and the known tuple size + overhead, only that number
> of MCVs are selected for the MCV table regardless how many there are.
> The MCV table size would then increase as work_mem is changed up to a
> maximum given by the number of MCVs.

Sounds fine.  Maybe 2-3% would be better.

> The code when building the MCV hash table keeps track of the order of
> insertion of the best MCVs.  It then flushes the MCV partitions in
> decreasing order of frequency of MCVs.  Thus, by the end of the build
> partitioning phase the MCV hash table should only store the most
> frequent MCV tuples.  Even with many-to-many joins as long as we keep
> all build tuples that have a given MCV in memory, then everything is
> fine.  You would get into problems if you only flushed some of the
> tuples of a certain MCV but that will not happen.

OK, I'll read it again - I must not have understood.

It would be good to post an updated patch soon, even if not everything
has been addressed.

...Robert


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Robert Haas"
Дата:
Сообщение: Re: Documenting serializable vs snapshot isolation levels
Следующее
От: "Nikhil Sontakke"
Дата:
Сообщение: Re: plpgsql: numeric assignment to an integer variable errors out