Re: [PERFORM] A Better External Sort?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PERFORM] A Better External Sort?
Дата
Msg-id 8643.1128181455@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [PERFORM] A Better External Sort?  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: [PERFORM] A Better External Sort?  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
> The biggest single area where I see PostgreSQL external sort sucking is
>   on index creation on large tables.   For example, for free version of
> TPCH, it takes only 1.5 hours to load a 60GB Lineitem table on OSDL's
> hardware, but over 3 hours to create each index on that table.  This
> means that over all our load into TPCH takes 4 times as long to create
> the indexes as it did to bulk load the data.
> ...
> Following an index creation, we see that 95% of the time required is the
> external sort, which averages 2mb/s.  This is with seperate drives for
> the WAL, the pg_tmp, the table and the index.  I've confirmed that
> increasing work_mem beyond a small minimum (around 128mb) had no benefit
> on the overall index creation speed.

These numbers don't seem to add up.  You have not provided any details
about the index key datatypes or sizes, but I'll take a guess that the
raw data for each index is somewhere around 10GB.  The theory says that
the runs created during the first pass should on average be about twice
work_mem, so at 128mb work_mem there should be around 40 runs to be
merged, which would take probably three passes with six-way merging.
Raising work_mem to a gig should result in about five runs, needing only
one pass, which is really going to be as good as it gets.  If you could
not see any difference then I see little hope for the idea that reducing
the number of merge passes will help.

Umm ... you were raising maintenance_work_mem, I trust, not work_mem?

We really need to get some hard data about what's going on here.  The
sort code doesn't report any internal statistics at the moment, but it
would not be hard to whack together a patch that reports useful info
in the form of NOTICE messages or some such.

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: On Logging
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: effective SELECT from child tables