Re: robots.txt on git.postgresql.org

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: robots.txt on git.postgresql.org
Дата
Msg-id 20130711135058.GG27898@alap2.anarazel.de
обсуждение исходный текст
Ответ на Re: robots.txt on git.postgresql.org  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 2013-07-11 14:43:21 +0100, Greg Stark wrote:
> On Wed, Jul 10, 2013 at 9:36 AM, Magnus Hagander <magnus@hagander.net> wrote:
> > We already run this, that's what we did to make it survive at all. The
> > problem is there are so many thousands of different URLs you can get
> > to on that site, and google indexes them all by default.
> 
> There's also https://support.google.com/webmasters/answer/48620?hl=en
> which lets us control how fast the Google crawler crawls. I think it's
> adaptive though so if the pages are slow it should be crawling slowly

The problem is that gitweb gives you access to more than a million
pages...
Revisions: git rev-list --all origin/master|wc -l => 77123
Branches: git branch --all|grep origin|wc -
Views per commit: commit, commitdiff, tree

So, slow crawling isn't going to help very much.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: robots.txt on git.postgresql.org
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: robots.txt on git.postgresql.org