Обсуждение: Removing obsolete comment block at the top of nbtsort.c.

Поиск
Список
Период
Сортировка

Removing obsolete comment block at the top of nbtsort.c.

От
Peter Geoghegan
Дата:
nbtsort.c has a comment block from the Berkeley days that reads:

 * This code is moderately slow (~10% slower) compared to the regular
 * btree (insertion) build code on sorted or well-clustered data. On
 * random data, however, the insertion build code is unusable -- the
 * difference on a 60MB heap is a factor of 15 because the random
 * probes into the btree thrash the buffer pool. (NOTE: the above
 * "10%" estimate is probably obsolete, since it refers to an old and
 * not very good external sort implementation that used to exist in
 * this module. tuplesort.c is almost certainly faster.)

I propose removing this whole comment block (patch attached), because:

* The "NOTE" sentence in parenthesis was actually written by Tom in
1999, as part of the original tuplesort commit. If tuplesort.c was
"almost certainly faster" in its first incarnation, what are the
chances of it actually still being slower than incremental insertions
with presorted input at this point? There were numerous large
improvements to tuplesort in the years since 1999.

* Even if the original claim was still true all these years later, the
considerations for nbtsort.c have changed so much that it couldn't
possibly matter. The fact that we're not using shared_buffers to build
indexes anymore is a significant advantage for nbtsort.c, independent
of sort performance. These days, CREATE INDEX spends most of the time
bottlenecked on WAL-logging when building a index against presorted
SERIAL-like inputs, especially when parallelism is used. Back in 1999,
there was no WAL-logging.

-- 
Peter Geoghegan

Вложения

Re: Removing obsolete comment block at the top of nbtsort.c.

От
Alvaro Herrera
Дата:
On 2018-Jun-24, Peter Geoghegan wrote:

> nbtsort.c has a comment block from the Berkeley days that reads:
> 
>  * This code is moderately slow (~10% slower) compared to the regular
>  * btree (insertion) build code on sorted or well-clustered data. On
>  * random data, however, the insertion build code is unusable -- the
>  * difference on a 60MB heap is a factor of 15 because the random
>  * probes into the btree thrash the buffer pool. (NOTE: the above
>  * "10%" estimate is probably obsolete, since it refers to an old and
>  * not very good external sort implementation that used to exist in
>  * this module. tuplesort.c is almost certainly faster.)
> 
> I propose removing this whole comment block (patch attached),

Makes sense to me, +1.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services