Yahoo Now Supports Wildcards in Robots.txt
In an effort to accommodate webmasters and following a call made by Danny Sullivan at the last Search Engine Strategies, Yahoo added today wildcard support in Robots.txt.
For those of you who don’t know the wildcard “*” replaces any string of characters, and can be used in generalizing filenames, folders and sub-domains. This recent upgrade can make your life much easier if you disallow Yahoo’s Slurp from parts of your site.
‘*’ - matches a sequence of characters
You can now use ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. You can use this symbol in any part of the URL string you provide in the robots directive. For example,
User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html
Disallow: /*?sessionidThe robots directives above will:
* allow all directories that begin with ‘public’, such as ‘/public_html/’ or ‘/public_graphs/’ to be crawled
* disallow any files or directories which contain ‘_print’, such as ‘/card_print.html’ or ‘/store_print/product.html’ to be crawled
* disallow any files with ‘?sessionid’ in their URL string, such as ‘/cart.php?sessionid=342bca31’ to be crawledNote that a trailing ‘*’ is redundant since that is existing matching behavior for Slurp. So, the following two directives are equivalent:
User-Agent: Yahoo! Slurp
Disallow: /private*
Disallow: /private‘$’ – anchors at the end of the URL string
You can now also use ‘$’ in robots directives for Slurp to anchor the match to the end of the URL string. Without this symbol, Yahoo! Slurp would match all URLs against the directives, treating the directives as a prefix. For example:
User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$The robots directives above will
* Disallow all files ending in ‘.gif’ in your entire site. Note that without the ‘$’, this would disallow all files containing ‘.gif’ in their file path
* Allow all files ending in ‘?’ to be included. This would not automatically allow files that just contain ‘?’ somewhere in the URL stringAs you can see, this symbol only makes sense at the end of the string. Hence, when we see it, we assume that your directive terminates there and any characters after that symbol are ignored.
Related Stories
POSTED IN: Yahoo
0 opinions for Yahoo Now Supports Wildcards in Robots.txt
No one has left a comment yet. You know what this means, right? You could be first!
Have an opinion? Leave a comment: