There was a situation before version 1.04, in which links to the special actions on a page - edit, rename, delete etc - would be formed in such a way, when using Rewrite Rules, that would be very hard to keep from being spidered.
For example, this page has a URL as follows:
http://www.waywood.co.uk/MonkeyWiki/SpiderProblem.html (or http://www.waywood.co.uk/MonkeyWiki/SpiderProblem.html?action=goto - the two were synonymous).
Previously, the link to edit this page would have been:
http://www.waywood.co.uk/MonkeyWiki/SpiderProblem.html?action=edit, with the other action links similar. When the page was spidered, all these action links would be visited, and in the case of search engines, indexed and diplayed in searches. This was not only a waste of resources, but unfortunate in the sense of the links that were advertised to the world!
Well-mannered robots can of course be controlled with robots.txt, but with the links formed in this way, there was no way to exclude all the action links, because the protocol does not allow you to forbid a URL with a query string as opposed to the same URL without one.
I have therefore reverted to forming all links except the 'goto' kind in the same way as when Rewrite Rules are not being used. In this way, the script always appears in the URL (except a normal request to view a page), and it is simple enough to exclude spiders - i.e. the cgi-bin (or wherever the script is located) can be forbidden. Obviously bad-mannered spiders are not kept out, but so far this has not been a noticable problem.
Last modified: Mon May 14 16:57:42 2007
Wiki Spam
controlled by LinkSleeve