Showing posts with label Crawlers. Show all posts
Showing posts with label Crawlers. Show all posts

Friday, September 20, 2013

Recovering from an Unnatural Links Penalty - A Real Life Example

Many website and blog posts written as guides to help someone recover from an unnatural links penalty only describe generic actions that one may take. Unfortunately, as most have found, these articles rarely deal with specific actions, and leave the violators at a loss for resolving the underlying issue.

This post describes a real experience of how a fellow blogger removed an unnatural links penalty violation and restored his PageRank.

Background
In mid-August, I received a message from a blogging friend stating that he had received a notice from Google in Webmaster Tools indicating that his blog violated Google's quality quidelines. As a result, his blog may or may not appear in search results, or not place as highly in the results. In addition, Google provided him with a specific post.

This type of violation is commonly referred to as an:

"Unnatural Links Penalty".

His blog is hosted on Blogger, is 7 years old, has over 1,000 posts, and enjoyed a PageRank of 3. (Note that out of respect for his privacy, I will not list his name or blog URL).
Google gets tough on link sellers with PageRank penalty (from venturebeat.com)

At the time I was contacted, his PageRank had dropped to 0, but the blog was still appearing in search results. Prior to contacting me, the blogger attempted to fix this violation by searching the web and asking for help in the google forums. To that extent, he had received a couple of good suggestions which he followed. However, these actions did not resolve the problem.

Primarily, my friend's blog is a personal blog, consisting of a variety of stories, opinions, and observations. However, in his attempt to generate revenue, he enrolled in a few "Sponsored Pay per Review" programs and began writing sponsored posts. Thinking that these posts were causing the problem, he changed the included URL's to "nofollow" and asked for reconsideration. That did not help. Further, people told him that it appeared his site was "just selling links" and did not have a consistant theme. These comments were frustrating and disappointing to the blogger.

My Review
The first course of action that I took was to understand what steps had already been taken. After learning this, I realized that the penalty was "blog" related and not "post" specific.

So, I began looking at all the widgets and links he had on his sidebar and footings. There, I recommended that he remove all the "Sponsored Post" links and images. Next, I found a link to a gambling site that needed to be removed. Still, the penalty remained.

Finally, I followed the remaining widgets that he placed on the blog. At the bottom was a link to a "Book Review" site. When I visited that site, I saw 2 reviews. Both were reviews of individual posts on his blog with links back to the posts.  I realized that this created a "circular" link from his blog, to that site, and back to his blog. Clearly, that was unnatural.

He then removed that advertising widget from his blog and he asked for reconsideration. Within days, he received a message from Google saying that the violation was removed and the penalty would be lifted. At last, the problem was discovered!  By the end of the following day, the blog's page rank was restored back to 3 again and all is now well.

Lessons learned - Things to Avoid
Very often, unnatural links can be created out of ignorance. Many times, a blogger or webmaster is simply trying to take advantage of the tools available. This is particularily true for seasoned and experienced authors. However, some blogs and websites are created the with malicious intent to boost their PageRanks and earn a quick buck. While Google can distinguish between the age of the sites, it cannot always tell what the motivation is. Thus, play it safe and try to maintain a clean enviromnent.

From this experience and my previous encounters, my friend and I learned a few important lessons.
  1. Try to avoid penalties by refraining from violating guidelines entirely. Always err on the side of caution.
  2. Refrain from writing Sponsored Posts. However, if you do this, make sure that: you begin by saying that you are being paid for the content of the post; and be sure to set each link in that post as "no follow".
  3. Do Not place links to your blog or website on other websites or blogs. This includes comments.
  4. Do Not place links to gambling (i.e. those where you can actually gamble or practice gambling) or other prohibited site.
  5. Enroll your blog or website on Google's Webmaster Tools. By doing this, you can be alerted to violations, rather that operating in the dark. 
  6. Do Not link sub-domains together or to your site. Sub-domains are considered to be unique URLs. When you use and reference sub-domains, you are effectively creating fake circular links.
Removing a violation
If you do receive an unnatural links penalty notice, first read and understand Google's content guidelines. Then, follow the advice of this post. You have to think back and remember at all your actions and question everything that you have done. For example: Did you leave your links in comments just to get a backlink?, Did you write Sponsored posts and not indicate it?, Are you using sub-domains?, etc.

An Exception
One exception to "circular" unnatural links are those you would list in a "Links to My Other Sites" section. These should only be links to other sites that you own, and the heading should be labeled clearly.

Summary
Whenever you question one of your links or widgets, flag them as "no follow" or remove them entirely. If you are hesitant to make these type of changes to restore your credibility, then perhaps the violation was intended rather than accidental; and thus, a penalty is warranted.

JL..........

Friday, March 2, 2012

Google Webmaster Tools adds URL parameters section

In the continuing improvement of its Webmaster Tools, Google has added a new "URL parameters" section under the "Site configuration" drop down menu.

This is an important feature in helping webmasters control the Google's crawl rate of sites where the displayed content can be modified or filtered based on URL calling parameters. For example, this can be particularly important to online merchants and stores.

As best as we have determined, Google follows nearly every hit on your site with their own mirrored hit. It records all the parameters and then has its Googlebot crawl your pages with the new parameters. By doing this, Google can index your various pages based on your content. For example, let's assume that you own an online furniture store, called onlinestoreexample.com.

When someone visits your store at that url, they are shown a series of departments. Let's say Department 1 is bedding and Department 2 is carpets.  If your visitor clicks on the bedding link, they will access the page using the url: onlinestoreexample.com?dept=1. If they want to visit the carpet section, they click on the url onlinestoreexample.com?dept=2.

This is all well and good until someone tries to access your pages by typing the url themselves. If they make a mistake entering the dept parameter and type in "dpt" or "..." instead, then these parameters will be passed along to the Googlebot crawler. This means that your site will be crawled much more often than necessary and that the error pages you display become indexed as well.

By now enhancing Webmaster Tools and giving webmasters the ability to see the crawling parameters, one can now inform Google whether the parameters that they use for crawling are valid or not. For each parameter listed, you have a choice of letting Google decide if it is important, or if it does not affect your page content.

In our own website, we were shown 19 parameters. Of those, 7 were invalidly formed and resulted in errors. By indicating that these 7 parameters did not affect our content, approximately 400,000 urls were eliminated from the crawler.

So, if you have a website and rely on Google Webmaster Tools, we recommend that you visit your own URL parameters section and learn how Google sees your site.

Friday, January 20, 2012

Use Bing's Webmaster Tools to reduce and slow down crawl rates

If you have a website and it is being crawled too often by Bing, Yahoo, or Live, this post describes how to reduce their crawl rate to acceptable levels. 

Last week, we began receiving 500 and 503 errors from one of our affiliate stores. This had the undesirable side effect of placing our local instance of Apache's web server in an error state and thus taking our site offline for several hours each day.

We realized that our site was down, but did not know why. After reading through our servers log files, we discovered the 5xx errors. After researching these http error codes, we found that we could not fix these errors directly. Instead, we had to correct the root cause.

Searching through our affiliate's website, we found that they will return these error codes when their server receives too many requests from a particular IP. So, we returned back to our log files and found that the Bing, Yahoo, and Live crawlers were simultaneously requesting many of our pages at the same time.

In order to fix our problem, we had to slow down these crawlers. Our first action was to add a crawl delay to our robots.txt file. Initially, we set this to 60 seconds.

Next, we discovered Bing Webmaster Tools.

In order to utilize this, we needed to sign in with our Windows LiveID. We did not have one so we created a new one. That was very easy and we were able to Sign In to that site within minutes.

Next,  we had to add our site to the crawler. The Bing Webmaster Home page has two sections. The first is for messages, and the second is for sites. We found the "Add Site" link and submitted our site's URL.

Unfortunately, it takes about 3 days before any statistics are displayed. So, we just waited.

Once we saw that Bing was crawling and indexing our pages, we were then able to reduce the crawl rate.

This was done by:
  • Signing into the Bing Webmaster Tools
  • Clicking on our site's URL listed in the Sites section.
  • That brought us to the Dashboard page.
  • At the top is a "Crawl" link, and we clicked on it.
  • The next page then provided a sub-menu.
  • We clicked on the "Crawl Settings" link and it brought us to a graphical "Crawl Rate" page.
  • We lowered our Crawl rate to Minimum (by highlighting the boxes for each hour of the day)
  • And lastly, we pressed the "Save" link.
Within 2 days, the Bing, Yahoo, and Live crawlers were behaving properly, and all of our HTTP 5xx errors disappeared.

During this process, we learned five important things about crawlers:
  1. The Google bot crawl rate is well behaved, and does not overwhelm your server
  2. The Google crawler ignores the "Crawl-delay" command in the robots.txt
  3. Bing only allows a maximum crawl-delay of 4 seconds
  4. Once your site becomes large enough, the crawling bots can harm your site
  5. Crawlers are tamable.
Note: To set a crawl delay in your robots.txt file, enter the two lines:

User-Agent: *
Crawl-delay: 4


at the top of the file.

Even if you are not experiencing problems with your website, we suggest that you submit your site to  Bing Webmaster Tools. Although the interface is slow, it provides a wide variety of information about your website which is a great complement to the Google Webmaster Tools.
Related Posts Plugin for WordPress, Blogger...

Earn Money - Join the Leading Affiliate Program