A Look Into Proper HTML5 Semantics

If you carefully plan the structure of your HTML documents, you can help computers make sense of the meaning of your content. Proper syntax is important for sure, but it basically just provides parsers, search engines, and assistive technologies with a meaningless bunch of data.

If you improve your front-end workflow with paying attention to semantics, you can create a higher quality content that attracts more visitors. Semantics is the study of meaning, in a broader context it’s a branch of logic and linguistics.

In the world of web development we talk about semantic content when computers understand the structure of a document, and the roles of the elements inside of it. If we want to create proper semantics, we need to deeply understand the structure of our content and the capabilities of front-end technologies.

So what are the tangible benefits? Proper semantics means a more searchable content that leads to a better search engine ranking. We also increase accessibility, as assistive technologies such as screen readers can better interpret the meaning of our content.

There are many different front-end development techniques that enable developers to achieve a semantic page structure. This post will provide you with a brief intro into semantic HTML tags and the concept of the document outline.

How to Auto-generated Table of Contents with HTML Slots

How to Auto-generated Table of Contents with HTML Slots

Table of contents can greatly improve the user experience of many websites, for instance documentation sites or online... Read more

Semantic and Non-Semantic HTML Tags

The concept of semantics is not as new as it seems, it existed well before the era of HTML5. The term semantic web was coined as early as in 2001 by Sir Tim Berners-Lee. Under “semantic web” he meant a web of data that can be processed by machines.

This primarily means that separate HTML elements need to have their distinguishable structural roles. According to the definition of W3C “a semantic element clearly describes its meaning to both the browser and the developer”.

Semantic Elements Before HTML5

Semantic elements existed before HTML5 too, just in most cases developers weren’t aware that some of the tags they used were actually semantic. Just think about the <form></form> or the <img> tags.

Their roles are clear for both us and the user agent: <form> simply contains a form, just like <img> contains an image. Nobody will ever place a table or a headline inside and <img> tag (or at least let’s hope so).

The <table></table> element, and its related tags such as table rows, table cells, etc. are also semantic tags that existed before HTML5, however due to the table-based layout that was heavily used for many years, most developers didn’t utilized them in the semantic way. This led to a web that sacrificed logical structure for layout.

Ordered and unordered lists, paragraphs, h1-h6 heading tags are all semantic elements that preceded HTML5.

Non-Semantic Elements

Non-semantic elements don’t have any special meaning, the hierchical relationships between them are merely illusory. The most widely used examples of non-semantic HTML tags are the <div></div> and the <span></span> tags.

If your site ever caught the horrible disease of divitis, you know what I’m talking about. Yep. Divs are not necessarily wrong, but divitis needs to be fought if we want to write maintainable, modular, and meaningful HTML code.

Fight Against Divitis

Smashing Magazine beautifully explains what the real problem is with the excessive and unreasonable use of the <div> tag. The gist is that if we include a div inside a div, it appears as though the outer div would be the parent element of the inner one, while in reality this is not the case.

There’s no relationship between the two, just like in case of the <span> tag that works the same way, just on the inline level.

Don’t panic if you still feel attached to <div>-s and <span>-s though, as you don’t have to completely ditch them. They are still the best choice for grouping content solely for styling purposes and in other last resort cases.

Text Semantics in HTML5

HTML5 introduced many new semantic elements that make easy content organization possible. They don’t only help you organize content on the level of the whole document (see in details in the next section), but also inside text blocks, as inline tags.

Probably the most popular text-level semantic tags are <strong></strong> and <em></em>, but they also existed before HTML5. Among the new inline semantic elements we can find for example the <time></time>, tag for human-readable date-times, and <mark></mark> for highlighted text.

See this list for all text-level semantic elements that are currently in use.

Document Outline in HTML5

The document outline is the structure of an HTML document. It shows how elements are related to each other. The document outline has been generated solely by mapping elements, such as headings, table titles, form titles, and others in the earlier versions of HTML such as HTML4.01 and XHTML.

In HTML5 the outlining algorithm has been enhanced by new sectioning elements, namely:

  • <section></section> for sections grouped around a specific theme
  • <article></article> for complete or self-contained compositions such as a blog post or a widget
  • <nav></nav> for navigation blocks
  • <aside></aside> for complementary content such as sidebars.

There’s a fifth sectioning element in HTML5, but it’s not new, it’s the <body></body> tag. The <header></header> and <footer></footer> tags are also new, but they don’t generate new sections in a document, they divide up the content inside sections. This means that every sectioning element (body, article, section, nav and aside) can have its own header and footer.

Tips For Semantically Structured Content

If we want to create a well-structured document outline we need to pay attention to the following rules:

1. The outermost sectioning element is always the <body></body> tag.

2. Sections in HTML5 can be nested.

3. Each section has its own heading hierarchy. Each of them (even the innermost nested section) can have an h1 tag.

4. While the document outline is primarily defined by the 5 sectioning elements, it also needs proper headings for each section.

5. It’s always the first heading element (let it be h1 or a lower rank heading tag) that defines the heading of the given section. The following heading tags inside the same section need to be relative to this. (If the first heading is an h4 inside a sectioning element, don’t put an h4 after that.)

6. The sections defined by the <nav></nav>, and the <aside></aside> tags don’t belong to the main outline of the HTML document, they are usually not rendered initially by assistive technologies.

7. Each section (body, section, article, aside, nav) can have their own <header></header> and <footer></footer> tags, that defines the header (such as logo, author’s name, dates, meta info, etc.) and the footer (copyright, notes, links, etc.) of that section.

Example: A Semantic Outline

Let’s see an example for a semantic document outline to see clearer how it works. Our example code will result in the following document structure:

Document Outline Example

And here is the code with proper semantic sectioning:

<body>

	<header>
		<h1>Welcome On Our Website!</h1>
		<p>Here is our logo and slogan.</p>
	</header>
	
	<nav>
		<header>
			<h2>Choose Your Interest</h2>
		</header>
		<ul>
			<li>Menu 1</li>
			<li>Menu 2</li>
			<li>Menu 3</li>
		</ul>
	</nav>
	
	<article>
		<header>
			<h1>Title of Article</h1>
			<h2>Subtitle of Article</h2>
		</header>
		
		<section>
			<h4>First Logical Part (e.g. "Theory")</h4>
			<p>Paragraph 1 in first section</p>
			
			<h5>Some Other Subheading in First Section</h5>
			<p>Paragraph 2 in first section</p>
		</section>
		
		<section>
			<h4>Second Logical Part (e.g. "Practice")</h4>
			<p>Paragraph 1 in second section</p>
			<p>Paragraph 2 in second section</p>
		</section>
	
		<footer>
			<h5>Author Bio</h5>
			<p>Paragraph in Article's Footer</p>
		</footer>
	
	</article>
	
	<aside>
		
		<h2>Get To Know Us Better</h2>
		
		<section>
			<h4>Popular Posts</h4>
			<ul>...</ul>
		</section>
		
		<section>
			<h4>Partners</h4>
			<ul>...</ul>
		</section>
		
		<section>
			<h4>Testimonials</h4>
			<ul>...</ul>
		</section>
	
	</aside>
	
	<footer>
		<ul>
			<li>Copyright</li>
			<li>Social Media Links</li>
		</ul>
	</footer>

</body>

If you take a look at the code snippet above, you’ll see that headers and footers are optional, we can freely choose if we want to use them or not, but it’s strongly recommended to always include a heading for each section, otherwise the section will be “Untitled” in the document outline.

Luckily there are many great tools all over the internet that allow us to check the document outline, this time we will use the Outliner tool of html5.org.

If we insert our code snippet into the form provided by the outliner, and click the “Outline this!” button, we will see the following result:

Example Code Outline

This is the document outline of our sample code, this is how search engines see it, and screen readers read it to their visually impaired users. It’s semantic, well-structured, and there’s no nasty “Untitled” sections in it.

If you want to look how an Untitled section looks like in the Outliner just delete some of the heading tags in the example code.

Other Aspects of Web Semantics

Semantic HTML tags and document outlines are only a small part of web semantics. The content of a web page can be made even more meaningful with the help of the WAI-ARIA accessibility protocol, and structured data that can be used together with the RDFa protocol, microdata, or the JSON-LD markup.

WebsiteFacebookTwitterInstagramPinterestLinkedInGoogle+YoutubeRedditDribbbleBehanceGithubCodePenWhatsappEmail