Re: Reducing buildfarm disk usage: remove temp installs when done

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Reducing buildfarm disk usage: remove temp installs when done
Дата
Msg-id 28310.1421645334@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Reducing buildfarm disk usage: remove temp installs when done  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: Reducing buildfarm disk usage: remove temp installs when done  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 01/18/2015 09:20 PM, Tom Lane wrote:
>> What I see on dromedary, which has been around a bit less than a year,
>> is that the at-rest space consumption for all 6 active branches is
>> 2.4G even though a single copy of the git repo is just over 400MB:
>> $ du -hsc pgmirror.git HEAD REL*
>> 416M    pgmirror.git
>> 363M    HEAD
>> 345M    REL9_0_STABLE
>> 351M    REL9_1_STABLE
>> 354M    REL9_2_STABLE
>> 358M    REL9_3_STABLE
>> 274M    REL9_4_STABLE
>> 2.4G    total

> This isn't happening for me. Here's crake:
>     [andrew@emma root]$ du -shc pgmirror.git/ [RH]*/pgsql
>     218M    pgmirror.git/
>     149M    HEAD/pgsql
>     134M    REL9_0_STABLE/pgsql
>     138M    REL9_1_STABLE/pgsql
>     140M    REL9_2_STABLE/pgsql
>     143M    REL9_3_STABLE/pgsql
>     146M    REL9_4_STABLE/pgsql
>     1.1G    total

> Maybe you need some git garbage collection?

Weird ... for me, dromedary and prairiedog are both showing very similar
numbers.  Shouldn't GC be automatic?  These machines are not running
latest and greatest git (looks like 1.7.3.1 and 1.7.9.6 respectively),
maybe that has something to do with it?

A fresh clone from git://git.postgresql.org/git/postgresql.git right
now is 167MB (using dromedary's git version), so we're both showing
some bloat over the minimum possible repo size, but it's curious that
mine is so much worse.

But the larger point is that git fetch does not, AFAICT, have the same
kind of optimization that git clone does to do hard-linking when copying
an object from a local source repo.  With or without GC, the resulting
duplicative storage is going to be the dominant effect after awhile on a
machine tracking a full set of branches.

> An alternative would be to remove the pgsql directory at the end of the 
> run and thus do a complete fresh checkout each run. As you say it would 
> cost some time but save some space. At least it would be doable as an 
> option, not sure I'd want to make it non-optional.

What I was thinking is that a complete-fresh-checkout approach would
remove the need for the copy_source step that happens now, thus buying
back at least most of the I/O cost.  But that's only considering the
working tree.  The real issue here seems to be about having duplicative
git repos ... seems like we ought to be able to avoid that.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: [PATCH] server_version_num should be GUC_REPORT
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_rewind in contrib