Control How Google Indexes Your Content with Meta Tags

There are many search engines out there but most of the time, site owners care about getting their site indexed on the almighty Google. One way to perform better on the search engine is to use meta robot tags.

The meta robot tag is a consensus meta tag among search engines like Google, Yahoo, and Bing. The use of the meta tag is to let web developers have accessibility control for web pages, against search engine crawlers. For instance, something like noindex will prevent all robots from the search engines from putting your web page in their index.

Google’s own robot is called Googlebot. In this post we will see how we address Googlebot exclusively through meta tags.

Addressing Googlebot

To address Googlebot, specify the meta name as googlebot instead of just robots. This example will prevent Googlebot from putting your web page in their index but still allow bots from Bing and Yahoo to crawl the page. Thus your web pages may still appear in Bing and Yahoo search results.

<meta name="googlebot" content="noindex">

Google has a number of special robots that crawl through different kinds of content such as Image, News, Video, Ads and Mobile. Google allows you to block these robots individually. If you do not want your website to appear in Google Mobile search results, for example, you can specify the meta robot tag this way:

<meta name="googlebot-mobile" content="noindex">

The full list of Google bot types can be found in the Google’s Website Crawlers page.

Prevent Image Indexing

It’s a real irritating when you find your copyrighted image used by someone else without your prior permission. If you want to minimize this from happening, you can prevent Google from putting your images on their index.

Specify the meta robot tag with the value of noimageindex. This will prevent the robot from indexing all the images in the page and your images will not appear in Google Image Search results, which is where people usually search for images.

<meta name="googlebot" content="noimageindex">

Alternatively, you can set the meta name as googlebot-image to specifically prevent Google’s robots from crawling your site for images.

<meta name="googlebot-image" content="noimageindex">

Prevent Translating

Google Chrome offers the translation of a site in foreign language to the visitor’s favorite or local language, with the help of Google Translate. While Google Translate translation is getting better, it’s far from perfect for some languages. The translation output could sometimes be really quirky.

If you don’t want Google to translate your web pages, set the googlebot meta with the value of notranslate, like so.

<meta name="googlebot" content="notranslate">

If you want to prevent a certain section of the page from being translated, you can add the notranslate class within the element wrapping the content:

<div class="notranslate">
<!-- the content -->

Google will ignore this <div> completely.

Prevent Indexing After A Specified Time

You can also prevent Google from indexing your web pages after certain period of time. This will be particularly useful for web pages which are only relevant within a timeframe, such as an event registration page, for example.

In this case, you probably want to tell the robot to not crawl and index this page after the event has ended, thereby preventing it from showing up in Google’s search results.

To do this, specify the meta tag with the value of unavailable_after then followed by the time info. The time format should comply with the RFC-850 format, for example: Thursday, 26-Sep-14 10:00:00 UTC

<meta name="googlebot" content="unavailable_after: Monday, 29-Sep-14 10:00:00 UTC">

Giving the above example, Google robot will not crawl the page after 29-Sep-14. The page will eventually disappear from the index, yet you can still retain the page for archiving in your website.