Searching Sitecore Page Datasources
When first enabling search in your Sitecore solution, you may find that page content is not searchable. This is especially true if you heavily rely on Experience Editor for your site assembly. Searching Sitecore page datasources is solved by a very common pattern that I refer to as “Page Visualization.”
Page Visualization involves creating a custom-computed field which crawls all data from all components on a page. You can inspect presentation details to do this, and it typically works out great.
I recently had an issue on one of my projects, however, where non-essential text such as “Share on Facebook” or “An error occurred” would reveal pages in search results. Lots of pages.
The Problem
Before I dig too deep into the solution, I want to share with you an actual example.
Imagine we have a form-style component: “Email this page to a friend.” This component includes some input fields with validators, an error message and a success message. In this example, you can see that we even allowed for content authors to put components within the error panel that would be displayed.

Now, this component was great and met our site’s needs. But then we discovered that we could search through all the text fields on this component. We also found we could search through any sub-components that were nested inside of our panels. For example, if this component were on a page and the user searched for “error,” then any page using this component would come back in search results. It was really annoying.
Why was this happening?
The way we were “visualizing” our pages for search results was essentially looping through all renderings on the page and then crawling all text fields on each rendering. A very common pattern, but a pattern that exposes potential complications with molecular component architecture.
The Solution
To solve this problem, I came up with a mechanism to allow components to “opt out” of being crawled and searched. I also had the requirement to allow molecular components to “opt out” their child components.
The trick? Create a base template that components can inherit from, and check for it at visualization time. It’s so simple, right?

Now that I have a template, I always create a custom item to reflect this template. It’s best practice to do this with your templates because it just makes things easier in the long run:
1 | public class VisualizationExclusionBase : CustomItem |
3 | public static ID TemplateId = ID.Parse( "{860CE90F-A13D-43E9-A52C-FC027ED8E822}" ); |
5 | public VisualizationExclusionBase(Item item) : base (item) { } |
7 | public static bool TryParse(Item item, out VisualizationExclusionBase parsedItem) |
9 | parsedItem = item == null || item.IsDerivedFrom(TemplateId) == false ? null : new VisualizationExclusionBase(item); |
10 | return parsedItem != null ; |
12 | #region implicit casting |
13 | public static implicit operator VisualizationExclusionBase(Item innerItem) |
15 | return innerItem != null && innerItem.IsDerivedFrom(TemplateId) ? new VisualizationExclusionBase(innerItem) : null ; |
18 | public static implicit operator Item(VisualizationExclusionBase customItem) |
20 | return customItem != null ? customItem.InnerItem : null ; |
Now, for the meat and potatoes, we have a custom-computed field. In my particular case I’m using Solr, but this technique could apply to other technologies as well. To make a custom-computed field, I add it via configuration of my search index:
1 | <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> |
2 | <sitecore> |
3 | <contentSearch> |
4 | <indexConfigurations> |
5 | <customIndexConfiguration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration"> |
6 | <!-- ... --> |
7 | <documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider"> |
8 | <fields hint="raw:AddComputedIndexField"> |
9 | <field fieldName="visualization" returnType="string">Playground.Base.Search.VisualizationField, Playground.Base</field> |
10 | </fields> |
11 | </documentOptions> |
12 | <!-- ... --> |
13 | </customIndexConfiguration> |
14 | </indexConfigurations> |
15 | </contentSearch> |
16 | </sitecore> |
17 | </configuration> |
Some people overwrite the default Sitecore _content field, but I didn’t in this case. It should work either way, though, depending on your search logic. The actual logic of the computed field is pretty simple. You’ll also see a ton of extensions to help make the code readable. I love extensions for this reason. 🙂
1 | public class VisualizationField : IComputedIndexField |
3 | public string FieldName { get ; set ; } |
4 | public string ReturnType { get ; set ; } |
7 | public object ComputeFieldValue(IIndexable indexable) |
9 | var result = new StringBuilder(); |
11 | var item = (SitecoreIndexableItem)indexable; |
22 | item.Item.Visualization.GetRenderings( |
23 | DeviceItem.ResolveDevice(item.Item.Database), false ) |
24 | .Where(x => item.Item.Database.GetItem(x.Settings.DataSource) |
25 | .IsDerivedFrom(VisualizationExclusionBase.TemplateId)); |
28 | foreach (var reference in item.Item.Visualization.GetRenderings( |
29 | DeviceItem.ResolveDevice(item.Item.Database), false )) |
32 | if (!reference.HasDatasource()) |
37 | if (reference.IsNestedWithinAny(exclusions)) |
41 | var source = item.Item.Database.GetItem(reference.Settings.DataSource, item.Item.Language); |
46 | if (source.ShouldNotBeVisualized()) |
50 | foreach (Field field in source.Fields) |
54 | if (field.ShouldBeIndexed()) |
56 | result.Append(field.Value.StripHtml()).Append( " " ); |
62 | foreach (Field field in item.Item.Fields) |
65 | if (field.ShouldBeIndexed()) |
67 | result.Append(field.Value.StripHtml()).Append( " " ); |
72 | return result.ToString(); |
When it comes to extensions, you’ve probably found that you have lots of similar ones. I find that I carry my extensions around from project to project. Here are some of mine that I put to good use in this case:
1 | public static class Extensions |
5 | public static bool IsDerivedFrom( this Item item, ID templateId) |
7 | return TemplateManager.GetTemplate(item).IsDerivedFrom(templateId); |
9 | public static bool IsDerivedFrom( this Template template, ID templateId) |
11 | return template.ID == templateId || template.GetBaseTemplates() |
12 | .Any(baseTemplate => IsDerivedFrom(baseTemplate, templateId)); |
16 | public static bool HasDatasource( this Sitecore.Layouts.RenderingReference reference) |
18 | return reference.RenderingItem != null && ! string .IsNullOrEmpty(reference.Settings.DataSource); |
22 | public static bool ShouldNotBeVisualized( this Item item) |
24 | return item.IsDerivedFrom(VisualizationExclusionBase.TemplateId); |
30 | public static bool IsNestedWithinAny( this Sitecore.Layouts.RenderingReference reference, |
31 | System.Collections.Generic.IEnumerable<RenderingReference> renderings) |
33 | foreach (var outerComponent in renderings) |
35 | var outerPlaceholder = outerComponent.Placeholder + "/" ; |
39 | if (reference.Placeholder.StartsWith(outerPlaceholder)) |
46 | private static readonly List< string > textFieldTypes = new List< string >( new [] |
63 | public static bool ShouldBeIndexed( this Field field) |
66 | if (!textFieldTypes.Contains(field.Type)) |
70 | if (field.Name.StartsWith( "__" )) |
74 | if (field.Definition.Template.BaseIDs.Contains(VisualizationExclusionBase.TemplateId)) |
81 | public static string StripHtml( this string source) |
83 | var htmlRegex = new Regex( "<.*?>" , RegexOptions.Compiled); |
86 | var removedTags = htmlRegex.Replace(source, string .Empty); |
87 | return System.Web.HttpUtility.HtmlDecode(removedTags); |
At this point, I simply have to go back to any template that I want to exclude from search results and change it to inherit from Visualization Exclusion Base. We now have a very clean way to control what is and isn’t searchable in our solution.
Hope this helps!
Really good approach. Big thanks!