b5media.com

Advertise with us

Enjoying this blog? Check out the rest of the Technology Channel Subscribe to this Feed

Search Engine Herald

Yahoo Now Supports Wildcards in Robots.txt

by Gilad on November 2nd, 2006

In an effort to accommodate webmasters and following a call made by Danny Sullivan at the last Search Engine Strategies, Yahoo added today wildcard support in Robots.txt.

For those of you who don’t know the wildcard “*” replaces any string of characters, and can be used in generalizing filenames, folders and sub-domains. This recent upgrade can make your life much easier if you disallow Yahoo’s Slurp from parts of your site.

‘*’ - matches a sequence of characters

You can now use ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. You can use this symbol in any part of the URL string you provide in the robots directive. For example,

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html
Disallow: /*?sessionid

The robots directives above will:

* allow all directories that begin with ‘public’, such as ‘/public_html/’ or ‘/public_graphs/’ to be crawled
* disallow any files or directories which contain ‘_print’, such as ‘/card_print.html’ or ‘/store_print/product.html’ to be crawled
* disallow any files with ‘?sessionid’ in their URL string, such as ‘/cart.php?sessionid=342bca31’ to be crawled

Note that a trailing ‘*’ is redundant since that is existing matching behavior for Slurp. So, the following two directives are equivalent:

User-Agent: Yahoo! Slurp
Disallow: /private*
Disallow: /private

‘$’ – anchors at the end of the URL string

You can now also use ‘$’ in robots directives for Slurp to anchor the match to the end of the URL string. Without this symbol, Yahoo! Slurp would match all URLs against the directives, treating the directives as a prefix. For example:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$

The robots directives above will

* Disallow all files ending in ‘.gif’ in your entire site. Note that without the ‘$’, this would disallow all files containing ‘.gif’ in their file path
* Allow all files ending in ‘?’ to be included. This would not automatically allow files that just contain ‘?’ somewhere in the URL string

As you can see, this symbol only makes sense at the end of the string. Hence, when we see it, we assume that your directive terminates there and any characters after that symbol are ignored.

POSTED IN: Yahoo

0 opinions for Yahoo Now Supports Wildcards in Robots.txt

  • No one has left a comment yet. You know what this means, right? You could be first!

Have an opinion? Leave a comment: