Re: Trouble with hashagg spill I/O pattern and costing

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Trouble with hashagg spill I/O pattern and costing
Дата
Msg-id 20200526191511.d7pwz4awrvdm4p6j@development
обсуждение исходный текст
Ответ на Re: Trouble with hashagg spill I/O pattern and costing  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Trouble with hashagg spill I/O pattern and costing  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Tue, May 26, 2020 at 11:40:07AM -0700, Jeff Davis wrote:
>On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote:
>> I'm not familiar with logtape internals but IIRC the blocks are
>> linked
>> by each block having a pointer to the prev/next block, which means we
>> can't prefetch more than one block ahead I think. But maybe I'm
>> wrong,
>> or maybe fetching even just one block ahead would help ...
>
>We'd have to get creative. Keeping a directory in the LogicalTape
>structure might work, but I'm worried the memory requirements would be
>too high.
>
>One idea is to add a "prefetch block" to the TapeBlockTrailer (perhaps
>only in the forward direction?). We could modify the prealloc list so
>that we always know the next K blocks that will be allocated to the
>tape. All for v14, of course, but I'd be happy to hack together a
>prototype to collect data.
>

Yeah. I agree prefetching is definitely out of v13 scope. It might be
interesting to try how useful would it be, if you're willing to spend
some time on a prototype.

>
>Do you have any other thoughts on the current prealloc patch for v13,
>or is it about ready for commit?
>

I think it's pretty much ready to go.

I have some some doubts about the maximum value (128 probably means
read-ahead values above 256 are probably pointless, although I have not
tested that). But it's still a huge improvement with 128, so let's get
that committed.

I've been thinking about actually computing the expected number of
blocks per tape, and tying the maximum to that, somehow. But that's
something we can look at in the future.

As for the tlist fix, I think that's mostly ready too - the one thing we
should do is probably only doing it for AGG_HASHED. For AGG_SORTED it's
not really necessary.


iregards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Trouble with hashagg spill I/O pattern and costing
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: hash join error improvement (old)