Re: robots.txt on git.postgresql.org

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: robots.txt on git.postgresql.org
Дата
Msg-id CABUevEyUM-CEmmBcHmX6VrnkHj8O7xYk6ZvfdSfk-T8O4jd-Vw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: robots.txt on git.postgresql.org  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: robots.txt on git.postgresql.org  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Wed, Jul 10, 2013 at 10:25 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 07/09/2013 11:30 PM, Andres Freund wrote:
>> On 2013-07-09 16:24:42 +0100, Greg Stark wrote:
>>> I note that git.postgresql.org's robot.txt refuses permission to crawl
>>> the git repository:
>>>
>>> http://git.postgresql.org/robots.txt
>>>
>>> User-agent: *
>>> Disallow: /
>>>
>>>
>>> I'm curious what motivates this. It's certainly useful to be able to
>>> search for commits.
>>
>> Gitweb is horribly slow. I don't think anybody with a bigger git repo
>> using gitweb can afford to let all the crawlers go through it.
>
> Wouldn't whacking a reverse proxy in front be a pretty reasonable
> option? There's a disk space cost, but using Apache's mod_proxy or
> similar would do quite nicely.

We already run this, that's what we did to make it survive at all. The
problem is there are so many thousands of different URLs you can get
to on that site, and google indexes them all by default.

It's before we had this that the side regularly died.


--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: robots.txt on git.postgresql.org
Следующее
От: Jeevan Chalke
Дата:
Сообщение: Regex pattern with shorter back reference does NOT work as expected