Photo
- - - - -

Lowering bots access frequency


  • Please log in to reply
7 replies to this topic

#1 Editor

Editor

    Moderately Sized Orange

  • Members
  • PipPipPip
  • 94 posts

Posted 26 May 2008 - 02:14 PM

Hello everyone, in order to optimize my site behaviour I've just rewritten the .htaccess file (I'm using WP). I also noticed from Awstat that I have a rather heavy not viewed traffic. Month to date, 2,3 gigs viewed and 1,65 not viewed. This not viewed traffic is originating mostly from Yahoo Slurp and Googlebot (totaling 1,1 gigs). Does someone know the exact syntax I can use in a robots.txt file in order to slow down these bot crawls without blocking my site indexation?
Thanks in advance.
  • 0

#2 MacManX

MacManX

    Huge Orange

  • Members
  • PipPipPipPipPipPip
  • 1,064 posts

Posted 27 May 2008 - 01:26 AM

I don't think that there is any certain way to decrease bot frequency besides frequently editing your robots.txt file to cycle between all access and no access. However, this can be quite inconvenient and could remove your listing from most search engines, depending on the frequency of the block.

Instead, try installing the Google (XML) Sitemaps Generator plugin. This will generate an XML sitemap of your blog. The XML sitemap is a fairly new protocol supported by Google, Yahoo, and Microsoft which issues instructions to the search bot regarding what to visit, the priority of the content, and how often it is changed. Once you have installed the plugin, you can assign a change frequency to each type of content on your blog (posts, pages, archives, etc.). As this is a new protocol, I'm not entirely sure if adjusting the change frequency has any impact on the frequency of bot visits, but it should in theory.

Once you have generated your sitemap, you can submit it to Google Webmaster Tools, where you can set the Googlebot's crawl rate of your site from "Normal" to "Slower". Unfortunately, I'm not aware of any way to lower the crawl rate of Yahoo's and Microsoft's bots, but setting the change frequency in the sitemap should theoretically take care of that.
  • 0

#3 Editor

Editor

    Moderately Sized Orange

  • Members
  • PipPipPip
  • 94 posts

Posted 27 May 2008 - 12:29 PM

Thank you MacManx, I will try the plugin.
  • 0

#4 NyteOwl

NyteOwl

    36 Bits forever!

  • Volunteer Moderators
  • PipPipPipPipPipPipPip
  • 1,913 posts

Posted 27 May 2008 - 01:10 PM

I also noticed a disconcerting new behaviour to Slurp!. It seems to be ignoring the robots.txt exclusion statements. I e-mailed Yahoo and tech support just reiterate word for word their robots info page but my logs clearly show Slurp! in directories excluded in robots.txt.


  • 0
Obsolescence is just a lack of imagination.

Sign up at ASO and enjoy friendly, quality hosting services. Use coupon code no2512 and save 15% or coupon code 2152on and take $5 off. Valid on both Shared Hosting and VPS plans.

#5 MacManX

MacManX

    Huge Orange

  • Members
  • PipPipPipPipPipPip
  • 1,064 posts

Posted 28 May 2008 - 02:09 AM

Unfortunately, it is possible that a bad bot may be spoofing its user-agent to mimic Slurp!'s, and that Yahoo may not be aware of it. Even if they knew, there's nothing that they can do about it.

Edited by MacManX, 28 May 2008 - 02:09 AM.

  • 0

#6 NyteOwl

NyteOwl

    36 Bits forever!

  • Volunteer Moderators
  • PipPipPipPipPipPipPip
  • 1,913 posts

Posted 28 May 2008 - 07:28 PM

QUOTE
Unfortunately, it is possible that a bad bot may be spoofing its user-agent to mimic Slurp!'s, and that Yahoo may not be aware of it. Even if they knew, there's nothing that they can do about it.


The IP of origin for the bot resolves back to Yahoo so it's their bot alright. And if there werren't aware of it ebfore they should be now - unless of course the first line techy just put the matter down to "ignorant suer" and let it go at that. A scenario that happens far too often.

  • 0
Obsolescence is just a lack of imagination.

Sign up at ASO and enjoy friendly, quality hosting services. Use coupon code no2512 and save 15% or coupon code 2152on and take $5 off. Valid on both Shared Hosting and VPS plans.

#7 billzo

billzo

    Very Large Orange

  • Members
  • PipPipPipPipPip
  • 684 posts

Posted 29 May 2008 - 02:52 AM

You can use the robots.txt crawl-delay directive to limit bot access frequency.

http://help.yahoo.co...r/slurp-03.html

http://en.wikipedia....delay_directive

Set it to a high enough amount so only so many pages are crawled per day. A crawl delay of 600 seconds should limit bots to crawling no more than 10 pages per hour, 240 per day. Or set it higher. They'll still crawl, but won't suck down your data transfer so much.

QUOTE
User-agent: *
Crawl-delay: 10

Edited by billzo, 29 May 2008 - 02:54 AM.

  • 0
Posted Image
Click here to sign up with A Small Orange today!

ASO servers are fast and they offer excellent support for a great price. ASO is the BEST host around!

I have been with ASO since June 2007 and recommend this host highly. ASO has only gotten better over time. There is no better testimonial about the quality of service of a webhost than a long-term customer like me. Don't make a frustrating and time-consuming mistake of signing up with any other webhost. ASO is what you need.

Whether you are hosting a small hobby site like a Wordpress blog, a serious business website, or need a Dedicated Server, VPS or Cloud hosting, ASO has superior quality professional-level hosting packages for you.


View A Small Orange Hosting Plans


$$$ Coupon codes to save you money $$$

Code to save $5: saveme$5
Code to save 15%: saveme15%


Enter the coupon codes when ordering to get your discount. Save $5 or 15% off your initial order.

Sign up with A Small Orange today! You will be glad you did.
Posted Image

#8 Editor

Editor

    Moderately Sized Orange

  • Members
  • PipPipPip
  • 94 posts

Posted 29 May 2008 - 01:09 PM

QUOTE (billzo @ May 29 2008, 9:52 AM) <{POST_SNAPBACK}>
You can use the robots.txt crawl-delay directive to limit bot access frequency.

http://help.yahoo.co...r/slurp-03.html

http://en.wikipedia....delay_directive

Set it to a high enough amount so only so many pages are crawled per day. A crawl delay of 600 seconds should limit bots to crawling no more than 10 pages per hour, 240 per day. Or set it higher. They'll still crawl, but won't suck down your data transfer so much.

QUOTE
User-agent: *
Crawl-delay: 10



Thanks Billzo. I've also found this robots.txt generator:

http://www.mcanerin..../robots-txt.asp
  • 0




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users