HTML has always been the skeleton of the web: fundamental to its function, usually hidden, missing in certain areas, and important not to break. This HTML and SEO guide is written focused semantic SEO and the technical alignment between HTML and content. Structured Data has been a hot topic issue in SEO this year (as it is many years), and while schema is a major element of structured data, proper HTML formatting is as well.
(If you’re interested in this topic, but not necessarily up on HTML, you can read a basic HTML guide here.)
The Philosophy of HTML and Friendly Browsers
Something that separates HTML from other languages is how forgiving it is. (We can have a fun argument about whether or not HTML is just a markup language, or a declarative programming language. For shorthand and simplicity we’ll go to “language” and you can argue with me on Twitter 😊) For example, It doesn’t necessarily throw errors if you forget to close markup—it will just assume you want to keep that markup open until the end of the page, where it will helpfully close the element for you.
An exercise for understanding how the browser interprets what you see is looking at the view-source and inspector side by side. (My favorite tool for this is John Hogg’s view-rendered-source comparing tool.) The view-source shows what the developer wrote, and the inspector shows what the browser interprets out of that, rendering thrown into the mix.
HTML design is kind of a Schrodinger’s art—you know what it looks like in your browser and the browsers you test on, but there are always use cases that you aren’t expecting (have you tested your site on smart fridges, Nintendo DS browsers, or smart watches? Have you tested screen readers, braille displays? Because those are all ways that people interface with the net nowadays. How does your website feel? How does it sound?)
Back to friendly browsers: because browsers want to help your HTML, this can lead to some weird behaviors that can be surprising and lead to SEO issues. For example, if you forget to close a tag, the browser will close it for you—albeit, perhaps, after you actually wanted the tag to end.
The actual page
The <body> and Semantic HTML
The <body> is where aligning your HTML and your content becomes extremely important.
Semantic elements in the body are a companion to the visual layout of the page. With HTML5, developers will be able to use semantic elements like <article> and <aside> to describe more about the content and how it fits onto the page. There are roughly 100 semantic elements available, and all of them help sculpt machine understanding of content.
While this doesn’t mean you have to go through and match the perfect semantic element to the perfectly crafted content that you’ve written, when structuring your HTML using semantic elements can provide simplicity to you, and will help the people and bots who navigate your site. Outlining your navigation, footer, and other details with semantic elements can be invaluable for screen reader users. On top of this, it can keep your written markup clean and legible – it’s a lot easier to sort through <section>s rather than a thousand <div> tags.
For another example of the importance of semantic HTML– links. Previous Google stated that it does not, cannot, and will not follow or look at links that exist outside proper link attributes. A proper link attribute involves an opening a tag and a href attribute. There’s a reason <span class=”link”>link</span> statements make SEOs shudder in fear. Here Martin Splitt shares a bit more broadly what types of link implementations you shouldn’t use:
H/T to Martin Splitt
Links are how the internet is structured; a link is a connection from one Web resource to another. It’s important to use proper link tags when you’re working with HTML; a link is made up of two anchors (<a> tags) and a direction. The link starts at the source and points to a destination.“
But recently, Gary Illyes said during a bay area search meetup; “I would stay away from <span> links. Not because we can’t understand them, but because it will take us longer to understand them. Whatever requires rendering takes time. ” This is a lot less, how shall we say, stringent than previous statements, and I’d consider it a pretty significant shift.
Another bold shift from Google came from a video John Mueller put out about <h1> tags and how to handle them for SEO and Accessibility. The part that made my eyes bulge slightly was this specific slide:
It makes sense that Google would want to move closer to understanding sites the way that viewers see them, instead of how they’re written—if someone has a title with <h1> tags that is different from something that might be a styled as a title (say, using CSS), so it’d be important for Google to see that discrepancy and read the more accurate version of the page.
HTML can be described as kind of a quantum layout language—You can write HTML as an input but you cannot, necessarily, be sure out what the output is. Users are everywhere, using all sorts of devices and aspect ratios and color-changing extensions and screen readers and more. You can control what you write, but not how your users will see the results.
By using the correct elements, you’re giving assistive device users and machines context for those anchors and the text around them, which helps spin more of a semantic web around your content. Even with Google’s ability to interpret your pages without Semantic HTML, it’s still important to understand and implement HTML properly.
The List and Table Advantage
Tables and lists are easy ways for Google and other robots to digest information. Be aware, though; machines are looking for expected behaviors—unexpected behaviors can easily trip them up.
Google SERP listing for “bob animal crossing”
Animal Crossing Wiki table entry.
Google likes to extract data from tables and display them in Knowledge Panels, Explore Panels, Featured Snippets or Rich Snippets. However, when the layout of the table deviates from Google’s assumptions one may end up with results that don’t align with the source information. A table element in HTML is usually made up of a defining <table> tag, and then <tr>, or table row, tags, defining each row; a table header is defined with a <th> tag. Each cell is defined by the <td>, or table data, tag.
In the example above, this table is nested inside another. The cell with “Gender” in it is labeled as a part of a table-row (“tr”) element, and also is as a cell labeled “td” or “table data.” So the snippet interprets Gender as Personality, and Male as Lazy. Google would probably better understand this table if the top row was marked with “th” or “table heading.”
Another interesting thing to note about tables is that browsers (and, presumably, bots) decide whether a table is a data table or a layout table. This is probably a holdover from the time where tables were more commonly used for layouts, rather than nowadays where developers use things like CSS-grid. People used to use tables to set up the layout of their pages, rather than using it for displaying data in a table format. This means machines often have to decide whether you’re looking at a table used for layout or for data. You can use the Accessibility Tree in the Chrome Inspector (right click, or ctrl + shift + I in Chrome) to see if a table is a Layout Table or Data Table.
The link above goes to a Codepen page where Paul Duncan runs through several scenarios, noting the kinds of things that Chrome seems to use to calculate whether a table is a data table or layout table. For example, zebra-striping (going back and forth between colors by row) seems to let the browser see a table is to do with data instead of layout. This is important for a few reasons- one is accessibility (screen readers and devices interact with layout tables and data tables differently). But it may also affect SEO—Januka Harshan on twitter noted that their competitors seemed to be picked up more often partially because those competitors used alternating colors. So, there’s a fun test for someone to try out.
Another way to help Google and robots understand data is lists. Google knows that for some types of information the easiest way to make information digestible is through a numbered list.
There are two different types of lists in HTML– <ul>, unordered list, and <ol>, ordered list. Google Featured Snippets tend to show up in four formats; paragraphs, lists, tables, and videos. Lists can be both a great way to convey information rapidly, but also often get cut off by the featured snippet, which can cause more click throughs. Google likes lists for items, steps of processes, tasks, etc.
John Mueller’s explanation shone a light on that. “The reason that kind of content can tend to rank for featured snippets is because it is well organized… They are ranking because the ideas contained in the content is coherent, it is organized and well structured.” Data from our Featured Snippet study backs this up; for some queries, Google prefers bulleted lists. In aggregate, we see <p> tags as the main source of Featured Snippets, but for some spaces the winner is <table> tags.
What’s up with Divs:
Debugging HTML can be as simple as looking at the page and checking if the appearance matches with your expectations. Comparing the source and the inspector and running your HTML through a HTML validator are other ways you can see if there’s anything wrong with what you’ve developed. Another recommendation I would make for HTML is to actually use a screen reader on your site—you can try JAWS or ZoomText for Windows, or if you’re on a Mac navigate to the Accessibility settings to enable VoiceOver. You can also try and navigate your site using only the keyboard. This can show you the shape of how your HTML affects your content, and since many people are only experiencing your site those ways it’s important to include them in your design.
Talking to the Robots: New Google Developments
HTML is an ever-evolving thing; recently, Google introduced new HTML attributes to help give Googlebot more information about your content. These are almost entirely specific to Search Engine Bots.
Semantic Link Attributes: On September 10th, Google posted about new link elements that could be used to describe nofollow links. A lot of webmasters were wondering how those kinds of attributes might affect their content, and why they should change from nofollow links to include new kinds of attributes like “ugc” or “sponsored.” I think this has some similar benefits to other semantic HTML elements—to semantically describe the links is to give Google more context, which can’t be a bad thing.
Snippet attributes: Google has also introduced more options for snippets, using the robots meta tags (in the head) and a new HTML attribute (data-nosnippet) in the body. This allows webmasters control over what search engines index and display in search results.
Even as Google gets better at “seeing” pages as users do, the truth of the matter is that GoogleBot is, itself, a biased user; it cannot read the page the way that every possible user does. It is still a machine, that makes mistakes, with some of the most advanced technology of the world powering it. Even testing and throttling for slower devices does not equal being a user working with those devices. One of the biggest advantages of semantic HTML is it can level the playing field for almost everyone. On top of this, Google is still a machine that makes mistakes; guiding it to your intent instead of letting it figure out what you mean is still more reliable.
Don’t make Google guess what you’re trying to say. Use the alignment of technical details with your content to help machines parse and comprehend what you have to say, and help your audience get to that content.