Re: robots.txt sometimes disallowing all?

Поиск
Список
Период
Сортировка
От Josh Kupershmidt
Тема Re: robots.txt sometimes disallowing all?
Дата
Msg-id CAK3UJRHdUq-KP9JVL+bFm2EUa_OAZxOX1ASGU_E_5Q0F+zaWtg@mail.gmail.com
обсуждение исходный текст
Ответ на robots.txt sometimes disallowing all?  (Josh Kupershmidt <schmiddy@gmail.com>)
Ответы Re: robots.txt sometimes disallowing all?  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-www
This behavior seems to still be going on, but I think I have a clue. I
noticed while experimenting with:

wget -O robots.txt http://www.postgresql.org/robots.txt && cat robots.txt

that wget tells me the available servers for www.postgresql.org it has
found in DNS:

Resolving www.postgresql.org... 87.238.57.232, 217.196.149.50, 174.143.35.230

When I fall to 217.196.149.50 and  87.238.57.232, I get the normal
robots.txt. When I fall to 174.143.35.230, I get the bad version
disallowing all access to the site. BTW, this behavior seems to not be
dependent on the user-agent string, contrary to my earlier
speculation. Could someone please check out what's going on with
robots.txt on 174.143.35.230, as it seems to seriously be screwing
with our Google search results.

Josh

On Wed, Jun 18, 2014 at 9:26 AM, Josh Kupershmidt <schmiddy@gmail.com> wrote:
> I noticed an unusual search result shown as the top result by Google
> (search query "POSTGRESQL DROP TRIGGER", first result for me leads to
> www.postgresql.org/docs/8.3/static/sql-droptrigger.html ). The title
> of the result is somehow "英語 - PostgreSQL", and below that title
> reads: "A description for this result is not available because of this
> site's robots.txt – learn more."
>
> Sure enough, when I checked http://www.postgresql.org/robots.txt in
> Chrome on OS X, I see:
>
> User-agent: *
> Disallow: /
>
> though when I check in other browsers (Safari, wget), I see a more
> reasonable robots.txt:
>
> ===
> User-agent: *
> Disallow: /admin/
> Disallow: /account/
> Disallow: /docs/devel/
> Disallow: /list/
> Disallow: /search/
> Disallow: /message-id/raw/
> Disallow: /message-id/flat/
>
> Sitemap: http://www.postgresql.org/sitemap.xml
> ===
>
> Is it intentional that we're serving up that first robots.txt to
> (apparently) Googlebot and Chrome?
>
> Josh



В списке pgsql-www по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: New mailing list?
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: robots.txt sometimes disallowing all?