What is a robots meta tag?
A robots meta tag is an HTML snippet that tells search engines how to crawl or index a certain page. It’s placed into the <head> section of a web page, and looks like this:
<meta name="robots" content="noindex" />
Why is the robots meta tag important for SEO?
The meta robots tag is commonly used to prevent pages showing up in search results, although it does have other uses (more on those later).
There are various types of content that you might want to prevent search engines from indexing:
- Thin pages with little or no value for the user;
- Pages in the staging environment;
- Admin and thank-you pages;
- Internal search results;
- PPC landing pages;
- Pages about upcoming promotions, contests or product launches;
- Duplicate content (use canonical tags to suggest the best version for indexing);
Generally, the bigger your website is, the more you’ll deal with managing crawlability and indexation. You also want Google and other search engines to crawl and index your pages as efficiently as possible. Correctly combining page-level directives with robots.txt and sitemaps is crucial for SEO.
What are the values and attributes of a robots meta tag?
Robots meta tags consist of two attributes: name and content.
You must specify values for each of these attributes. Let’s explore what these are.
The name attribute and user-agent values
The name attribute specifies which crawlers should follow these instructions. This value is also known as a user-agent (UA) because crawlers need to be identified with their UA to request a page. Your UA reflects the browser you’re using, but Google’s user-agents are, for example, Googlebot or Googlebot-image.
The UA value “robots” applies to all crawlers. You can also add as many robots meta tags into the <head> section as you need. For example, if you want to prevent your images from showing up in a Google or Bing image search, add the following meta tags:
<meta name="googlebot-image" content="noindex" />
<meta name="MSNBot-Media" content="noindex" />
Sidenote.
Both name and content attributes are non-case sensitive.
“Googlebot-Image,” “msnbot-media” and “Noindex” attributes also work for
the examples above.
The content attribute and crawling/indexing directives
The content attribute provides instructions on how to crawl and index information on the page. If there is no robots meta tag available, crawlers interpret it as index and follow. That gives them permission to show the page in search results and crawl all links on the page (unless stated otherwise with the rel=”nofollow” tag).
The following are the supported values for the content attribute by Google:
all
The default value of “index, follow”, no need to ever use this directive.
<meta name="robots" content="all" />
noindex
Instructs search engines not to index the page. That prevents it from showing in search results.
<meta name="robots" content="noindex" />
nofollow
Stops robots from crawling all links on the page. Please note that those URLs still may be indexable, especially if they have backlinks pointing to them.
<meta name="robots" content="nofollow" />
none
The combination of noindex, nofollow. Avoid using this as other search engines (e.g., Bing) don’t support this.
<meta name="robots" content="none" />
noarchive
Prevents Google from showing a cached copy of the page in the SERP.
<meta name="robots" content="noarchive" />
notranslate
Prevents Google from offering a translation of the page in the SERP.
<meta name="robots" content="notranslate" />
noimageindex
Prevents Google from indexing images embedded on the page.
<meta name="robots" content="noimageindex" />
unavailable_after:
Tells Google not to show a page in search results after a specified date/time. Basically a noindex directive with a timer. The date/time must be specified using the RFC 850 format.
<meta name="robots" content="unavailable_after: Sunday, 01-Sep-19 12:34:56 GMT" />
nosnippet
Opts out of all text and video snippets within the SERP. It also works as noarchive at the same time.
<meta name="robots" content="nosnippet" />
IMPORTANT NOTE
Since October 2019, Google offers more granular options to control if and how you want to display your snippets in the search results. This is in part due to the European Copyright Directive, which was first implemented by France with its new copyright law.
Crucially, this legislation already affects all website owners. How? Because Google no longer displays snippets (text, image or video) from your site to users in France unless you opt-in using their new robots meta tags.
We discuss how each of these new tags works below. That said, if this concerns your business and you’re looking for a quick solution, add the following HTML snippet to every page on your site to tell Google that you want no restrictions on your snippets:
<meta name="robots" content=”max-snippet:-1, max-image-preview:large, max-video-preview:-1" />
Note that if you use Yoast SEO, this piece of code is added automatically on every page unless you added noindex or nosnippet directives.
max-snippet:
Specifies a maximum number of characters Google can show in their text snippets. Using 0 will opt out of the text snippets, ‑1 declares no limit on the text preview.
The following tag will set up the limit to 160 characters (similar to standard meta description length):
<meta name="robots" content="max-snippet:160" />
max-image-preview:
Tells Google if and how big an image it can use for image snippets. This directive has three possible values:
- none no image snippet will be shown
- standard — a default image preview may be shown
- large — the largest possible image preview may ‑be shown
<meta name="robots" content="max-image-preview:large" />
max-video-preview:
Sets up a maximum number of seconds for a video snippet. As with the text snippet, 0 will opt out completely, ‑1 places no limits.
The following tag would allow Google to show maximum of 15 seconds:
<meta name="robots" content="max-video-preview:15" />
a quick note on using a data-nosnippet HTML attribute
Alongside the new robots directives introduced in October 2019, Google also introduced the data-nosnippet HTML attribute. You can use this to tag parts of text that you don’t want Google to use as a snippet.
This can be done in HTML on div, span, and section elements. The data-nosnippet is considered a boolean attribute, meaning that it’s valid with or without a value.
<p>This is some text in a paragraph that can be shown as a snippet<span data-nosnippet>excluding this part</span></p>
<div data-nosnippet>This will not appear in a snippet</div><div data-nosnippet="true">And neither will this</div>
Using these directives
Most SEOs don’t need to go beyond the noindex and nofollow directives, but it’s good to know that there are other options as well. Keep in mind that all directives listed above are supported by Google.
Let’s check the comparison with Bing:
Directive | Bing | |
---|---|---|
all | ✅ | ❌ |
noindex | ✅ | ✅ |
nofollow | ✅ | ✅ |
none | ✅ | ❌ |
noarchive | ✅ | ✅ |
nosnippet | ✅ | ✅ |
max-snippet: | ✅ | ❌ |
max-image-preview: | ✅ | ❌ |
max-video-preview: | ✅ | ❌ |
notranslate | ✅ | ❌ |
noimageindex | ✅ | ❌ |
unavailable_after: | ✅ | ❌ |
You can use multiple directives at once and combine them. But if they conflict (e.g., “noindex, index”) or one is a subset of another (e.g., “noindex, noarchive”), Google will use the most restrictive one. In these cases, it would be just “noindex”.Sidenote. Snippet directives may be overridden by structured data that allows Google to use any information within the annotation. If you want to prevent Google from showing snippets, adjust the annotation accordingly and make sure that you don’t have any license agreement with Google.A note on other directives
You might also come across directives that are specific to other search engines. An example would be “noyaca” which prevents Yandex from using its own directory for generating search results snippets.
Others may have been useful and used in the past but are already deprecated. For example, the “noodp” directive was used to prevent search engines from using the Open Directory Project for generating snippets.
How to set up the robots meta tag
Now that you know what all these directives do and look like, it’s time to get to the actual implementation on your website.
Robots meta tags belong into the <head>
section of a page. It’s pretty straightforward if you edit the code using HTML editors such as Notepad++ or Brackets. But what if you’re using a CMS with SEO plugins?
Let’s focus on the most popular option out there.
Implementing robots meta tags in WordPress using Yoast SEO
Go to the “Advanced” section below the editing block of each post or page. Set up the robots meta tag according to your needs. The following settings would implement “noindex, nofollow” directives.
The “Meta robots advanced” row gives you the option to implement directives other than noindex and nofollow, such as noimageindex.
You
also have the option to apply these directives sitewide. Go to “Search
Appearance” in the Yoast menu. There you can set up meta robots tags on
all posts, pages, or just on specific taxonomies or archives.
Sidenote. Yoast isn’t the only way to control meta robots tags in WordPress. There are plenty of other WordPress SEO plugins with similar functionality.
What is an X‑Robots-Tag?
The robots meta tag is fine for implementing noindex directives on HTML pages here and there. But what if you want to prevent search engines from indexing files such as images or PDFs? This is when x‑robots-tags come into play.
The X‑Robots-Tag is an HTTP header sent from a web server. Unlike the meta robots tag, it isn’t placed in the HTML of the page. Here’s what it can look like:
Checking HTTP headers is a bit more complicated. You can do it the old way in the Developer Tools or use a browser extension like ‘Live HTTP Headers.’
The Live HTTP Headers extension monitors all HTTP(S) traffic your browser sends (request headers) and receives (response headers). It’s captured live, so make sure the plugin is activated. Then go to the page or file that you want to inspect and check the plugin for the logs. It looks like this:
How to set up the X‑Robots-Tag
The configuration depends on the type of web server you’re using and which pages or files you want to keep out of the index.
The line of code look like this:
Header set X-Robots-Tag “noindex”
This example takes into account the most widespread server type—Apache. The most practical way of adding the HTTP header is by modifying the main configuration file (usually httpd.conf) or .htaccess files. Sounds familiar? This is the place where redirects also happen.
You use the same values and directives for the x‑robots-tag as a meta robots tag. That said, implementing these changes should be left for the experienced. Backups are your friends because even a small syntax error can break the whole website.
Trackback 1