There are many search engines out there but most of the time, site owners care about getting their site indexed on the almighty Google. One way to perform better on the search engine is to use meta robot tags.
meta robot tag is a consensus meta tag among search engines like Google, Yahoo, and Bing. The use of the meta tag is to let web developers have accessibility control for web pages, against search engine crawlers. For instance, something like
noindex will prevent all robots from the search engines from putting your web page in their index.
Google’s own robot is called Googlebot. In this post we will see how we address Googlebot exclusively through meta tags.
Recommended Reading: 5 Things You Can Do With HTML Meta Tag
To address Googlebot, specify the meta
googlebot instead of just
robots. This example will prevent Googlebot from putting your web page in their index but still allow bots from Bing and Yahoo to crawl the page. Thus your web pages may still appear in Bing and Yahoo search results.
<meta name="googlebot" content="noindex">
Google has a number of special robots that crawl through different kinds of content such as Image, News, Video, Ads and Mobile. Google allows you to block these robots individually. If you do not want your website to appear in Google Mobile search results, for example, you can specify the meta robot tag this way:
<meta name="googlebot-mobile" content="noindex">
The full list of Google bot types can be found in the Google’s Website Crawlers page.
Prevent Image Indexing
It’s a real irritating when you find your copyrighted image used by someone else without your prior permission. If you want to minimize this from happening, you can prevent Google from putting your images on their index.
Specify the meta robot tag with the value of
noimageindex. This will prevent the robot from indexing all the images in the page and your images will not appear in Google Image Search results, which is where people usually search for images.
<meta name="googlebot" content="noimageindex">
Alternatively, you can set the meta name as
googlebot-image to specifically prevent Google’s robots from crawling your site for images.
<meta name="googlebot-image" content="noimageindex">
Google Chrome offers the translation of a site in foreign language to the visitor’s favorite or local language, with the help of Google Translate. While Google Translate translation is getting better, it’s far from perfect for some languages. The translation output could sometimes be really quirky.
If you don’t want Google to translate your web pages, set the googlebot meta with the value of
notranslate, like so.
<meta name="googlebot" content="notranslate">
If you want to prevent a certain section of the page from being translated, you can add the
notranslate class within the element wrapping the content:
<div class="notranslate"> <!-- the content --> </div>
Google will ignore this
Prevent Indexing After A Specified Time
You can also prevent Google from indexing your web pages after certain period of time. This will be particularly useful for web pages which are only relevant within a timeframe, such as an event registration page, for example.
In this case, you probably want to tell the robot to not crawl and index this page after the event has ended, thereby preventing it from showing up in Google’s search results.
To do this, specify the meta tag with the value of
unavailable_after then followed by the time info. The time format should comply with the RFC-850 format, for example: Thursday, 26-Sep-14 10:00:00 UTC
<meta name="googlebot" content="unavailable_after: Monday, 29-Sep-14 10:00:00 UTC">
Giving the above example, Google robot will not crawl the page after 29-Sep-14. The page will eventually disappear from the index, yet you can still retain the page for archiving in your website.