Drupal is known for being a very SEO friendly content management system (CMS). The way it assembles its pages is crawler friendly. This makes it a popular choice for people looking to build dynamic web sites. However, there are a number of potential SEO problems with Drupal as well. These need to be dealt with to ensure that you get optimal results.
The very fact that Drupal is such a dynamic system is a factor that leads to some of its SEO problems. The content is stored in a database and retrieved at runtime. Almost all information is stored as a “node”, a basic, unstructured unit of content. Often, each “node” is associated with groups of keywords, known as “taxonomies”, and Drupal makes it easy to retrieve and sort information by these taxonomies. Since all content can be retrieved dynamically, Drupal generates generic URLs for the content, such as www.example.com/?q=node/3 or www.example.com/node/3.
These “internal” URLs are always present in Drupal, even though Drupal provides features that allow you to hide them, and instead present much friendlier URLs, known as aliases, to website users. There are multiple optional modules that may affect the generation of pages and the naming of URLs, and there are many modules that remain aware of the internal naming conventions, even when user-friendly URLs are being used. As a result, Drupal may expose both the internal URLs and the user-friendly URLs to users and web crawlers.
As a result of these kinds of architectural issues, many Drupal sites end up exposing content to the web via multiple URLs. When this happens, the multiple URLs can be crawled by the search engines, creating duplicate content problems. Here are some examples of duplicate content issues, and some other problems that can arise in Drupal.
1.Problem: duplicate content from aliases
Example: www.example.com/node/5 and www.example.com/content/how-to-surf, both pointing at the same physical document.
Solution: use robots.txt to disallow URLs that include “/node/” For example, you can include the following lines in robots.txt:Disallow: /node
Disallow: /*/node/Considerations: Note that this assumes that all URLs are available via friendly aliases. This should be the case if you’re using the pathauto module.>[?
- Problem: Drupal’s default robots.txt has errors.
Example: the default robots.txt uses “Disallow: /search”. This disallows only a page ending with /search, but not all of the Drupal internal search results pages, which is desired.
Solution: update the robots.txt to read:Disallow: /search/ - Problem: Pathauto can create many extra pages on the site if configured incorrectly.
Example: If you turn on “Create index aliases”, and you have a hierarchical alias (e.g., a page with a path containing a slash, such as music/concert/beethoven) Drupal automatically generates index pages that contain all pages in each category — for example all music, and all concerts.
Solution: Do not check the “Create index alias” checkbox in the Pathauto module.
- Problem: Incorrect setting of the Pathauto “Update action”, in a production environment, can cause URLs of published pages, which may already be indexed by the search engines, to change.
Solution: In development mode (before exposing the site to the search engines), use “Create a new alias, replacing the old one” to regenerate URLs whenever necessary (for example, if your Pathauto rules change). In production, once the site is exposed, set this to “Do nothing, leaving the old alias intact”.
- Problem: Some modules, such as Forums and Views, create sortable lists that can generate multiple URLs with duplicate content.
Solution: If you use such a module, be sure to exclude the sorted variations using the following robots.txt rule:Disallow: /*sort=
- Problem: The Forward module creates a link to a URL, on each page, that allows the page to be forwarded to a friend. You can easily end up with hundreds or thousands of such low-quality pages that are essentially boilerplates.
Solution: If you use this module, be sure to exclude the forward pages using the following robots.txt rule:
Disallow: /forward/
These problems can crop up on many Drupal systems, and all Drupal users should review their sites for these issues. Drupal may also have other issues, depending on the site and the degree of customization. For example, on several sites, we’ve seen Drupal generate complex CSS hierarchies that end up building hidden text into the pages. While search engines try to detect hidden text scenarios that are not a result of bad intent, this is a risk you don’t need. As long as you recognize what the issues are, they can be dealt with, and Drupal can be a great choice as a content management system. Most content management systems present even greater challenges to SEO.