Re: Mark/Restore and avoiding RandomAccess sorts

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Mark/Restore and avoiding RandomAccess sorts
Дата
Msg-id 200701062206.l06M6av04563@momjian.us
обсуждение исходный текст
Ответ на Mark/Restore and avoiding RandomAccess sorts  ("Simon Riggs" <simon@2ndquadrant.com>)
Ответы Re: Mark/Restore and avoiding RandomAccess sorts  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: Mark/Restore and avoiding RandomAccess sorts  (Heikki Linnakangas <heikki@enterprisedb.com>)
Список pgsql-hackers
I saw no replies to this.

---------------------------------------------------------------------------

Simon Riggs wrote:
> Merge Joins require us to potentially Mark and Restore positions in the
> tuples arriving from executor sub-nodes.
> 
> This currently means that if the tuples arrive from a Sort node, as they
> often do in an MJ, the sort node will be instructed to prepare a random
> access version of the sort result. That requires a full final merge of
> the output, so as to allow rewinding the input when a Restore operation
> is called.
> 
> An MJ doesn't actually need random access, it just needs to be able to
> rewind. The question is: how far does it need to rewind? In many cases,
> the Restore operation moves back a small number of tuples, with a unique
> inner scan requiring a rewind of just one tuple. 
> 
> It would certainly be cheaper, in most cases, for the Sort node to
> maintain a variable size rewind buffer, where the history of prior
> tuples is truncated each time we perform a Mark operation. This could be
> implemented as a modified Tuplestore that could then be trimmed down
> each time a Mark operation took place. If the tuplestore has not yet
> spilled to disk this could be a trivial operation.
> 
> Doing that would almost completely remove the overhead of the final
> merge step in the sort. The final merge often doubles elapsed time in
> cases where the sort is larger than work_mem, which it often is.
> 
> Implementing the variable mark/restore buffer as a dumb Tuplestore would
> mean that the space usage of the Sort could in worst case go as high as
> x2 total space. The worst case is where the inner scan is all a single
> value. The best case is where the inner scan is sufficiently unique over
> all its values that it never writes back to disk at all. 
> 
> So a further refinement of this idea would be to simply defer the final
> merge operation for the sort until the history required for the Mark
> operation exceeded, say, 10% of the sort size. That would then be
> sufficient to improve performance for most common cases, without risking
> massive space overflow for large and highly non-unique data. There's no
> problem with running the final merge slightly later than before;
> everything's still there to allow it. Reusing space in the tuplestore is
> also straightforward since that's exactly what the final merge already
> does, so some rework of that code should be sufficient.
> 
> This is a separate, but related idea of being able to avoid
> mark/restores completely when the outer scan is provably unique. 
> 
> I'm not intending to implement this idea just yet, but it seemed worth
> recording since it occurred to me - and discussing it as a TODO item.
> 
> Comments?
> 
> -- 
>   Simon Riggs             
>   EnterpriseDB   http://www.enterprisedb.com
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: InitPostgres and flatfiles question
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_ctl options