Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
419 views
in Technique[技术] by (71.8m points)

Allow and Disallow in Robots.txt

http://www.robotstxt.org/orig.html says:

Disallow: /help disallows both /help.html and /help/index.html

Now, google.com/robots.txt lists:

Disallow: /search  
Allow: /search/about  

Upon running robotparser.py, it returns false for both the above cases in Googles robots.txt.

Would somebody please explain me, what's the use of Allow in Allow: /search/about as it would return a false based on the Disallow entry above it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The module documentation for robotparser and its Python 3 counterpart, urllib.robotparser, mention that they use the original specification. This specification does not have an Allow directive; that is a non-standard extension. Some major crawlers support it, but you (obviously) don't have to support it to claim compliance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...