Skip to main content

Devops

Searching Sitecore Page Datasources in Atomic Architecture

Atomic@1x.jpg

Searching Sitecore Page Datasources

When first enabling search in your Sitecore solution, you may find that page content is not searchable. This is especially true if you heavily rely on Experience Editor for your site assembly. Searching Sitecore page datasources is solved by a very common pattern that I refer to as “Page Visualization.”

Page Visualization involves creating a custom-computed field which crawls all data from all components on a page. You can inspect presentation details to do this, and it typically works out great.

I recently had an issue on one of my projects, however, where non-essential text such as “Share on Facebook” or “An error occurred” would reveal pages in search results. Lots of pages.

The Problem

Before I dig too deep into the solution, I want to share with you an actual example.

Imagine we have a form-style component: “Email this page to a friend.” This component includes some input fields with validators, an error message and a success message. In this example, you can see that we even allowed for content authors to put components within the error panel that would be displayed.

1
Now, this component was great and met our site’s needs. But then we discovered that we could search through all the text fields on this component. We also found we could search through any sub-components that were nested inside of our panels. For example, if this component were on a page and the user searched for “error,” then any page using this component would come back in search results. It was really annoying.

Why was this happening?

The way we were “visualizing” our pages for search results was essentially looping through all renderings on the page and then crawling all text fields on each rendering. A very common pattern, but a pattern that exposes potential complications with molecular component architecture.

The Solution

To solve this problem, I came up with a mechanism to allow components to “opt out” of being crawled and searched. I also had the requirement to allow molecular components to “opt out” their child components.

The trick? Create a base template that components can inherit from, and check for it at visualization time. It’s so simple, right?

 

2

Now that I have a template, I always create a custom item to reflect this template. It’s best practice to do this with your templates because it just makes things easier in the long run:

1public class VisualizationExclusionBase : CustomItem
2{
3public static ID TemplateId = ID.Parse("{860CE90F-A13D-43E9-A52C-FC027ED8E822}");
4 
5public VisualizationExclusionBase(Item item) : base(item) { }
6 
7public static bool TryParse(Item item, out VisualizationExclusionBase parsedItem)
8{
9parsedItem = item == null || item.IsDerivedFrom(TemplateId) == false ? null : new VisualizationExclusionBase(item);
10return parsedItem != null;
11}
12#region implicit casting
13public static implicit operator VisualizationExclusionBase(Item innerItem)
14{
15return innerItem != null && innerItem.IsDerivedFrom(TemplateId) ? new VisualizationExclusionBase(innerItem) : null;
16}
17 
18public static implicit operator Item(VisualizationExclusionBase customItem)
19{
20return customItem != null ? customItem.InnerItem : null;
21}
22#endregion
23}

 

Now, for the meat and potatoes, we have a custom-computed field. In my particular case I’m using Solr, but this technique could apply to other technologies as well. To make a custom-computed field, I add it via configuration of my search index:

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2<sitecore>
3<contentSearch>
4<indexConfigurations>
5<customIndexConfiguration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">
6<!-- ... -->
7<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
8<fields hint="raw:AddComputedIndexField">
9<field fieldName="visualization" returnType="string">Playground.Base.Search.VisualizationField, Playground.Base</field>
10</fields>
11</documentOptions>
12<!-- ... -->
13</customIndexConfiguration>
14</indexConfigurations>
15</contentSearch>
16</sitecore>
17</configuration>

 

Some people overwrite the default Sitecore _content field, but I didn’t in this case. It should work either way, though, depending on your search logic. The actual logic of the computed field is pretty simple. You’ll also see a ton of extensions to help make the code readable. I love extensions for this reason. 🙂

1public class VisualizationField : IComputedIndexField
2{
3public string FieldName { get; set; }
4public string ReturnType { get; set; }
5 
6// iterate through all renderings on the page and grab the content of each that we want to be searchable
7public object ComputeFieldValue(IIndexable indexable)
8{
9var result = new StringBuilder();
10 
11var item = (SitecoreIndexableItem)indexable;
12if (item == null)
13{
14return string.Empty;
15}
16 
17// Don't pull renderings that are nested within things inheriting from
18// Visualization Exclusion Base.  Build up a collection of these items
19// based on the presentation of the page.  we'll check these later to
20// manage any exclusions that we should respect
21var exclusions =
22item.Item.Visualization.GetRenderings(
23DeviceItem.ResolveDevice(item.Item.Database), false)
24.Where(x => item.Item.Database.GetItem(x.Settings.DataSource)
25.IsDerivedFrom(VisualizationExclusionBase.TemplateId));
26 
27// Get all renderings on the page
28foreach (var reference in item.Item.Visualization.GetRenderings(
29DeviceItem.ResolveDevice(item.Item.Database), false))
30{
31// make sure rendering has a valid datasource
32if (!reference.HasDatasource())
33continue;
34 
35// make sure component isn't nested within any non-visualized components
36// we want to kill crawling of any molecular components that opt out
37if (reference.IsNestedWithinAny(exclusions))
38continue;
39 
40// pull the datasource and ensure it's valid for the current language
41var source = item.Item.Database.GetItem(reference.Settings.DataSource, item.Item.Language);
42if (source == null)
43continue;
44 
45// make sure datasource should be visualized
46if (source.ShouldNotBeVisualized())
47continue;
48 
49// Go through all fields on datasource
50foreach (Field field in source.Fields)
51{
52// make sure we're looking at a custom text field, and not a system field
53// or something more complex like a treelist
54if (field.ShouldBeIndexed())
55{
56result.Append(field.Value.StripHtml()).Append(" ");
57}
58}
59}
60 
61// don't forget to dig through all fields on the page, to pick up any page level content
62foreach (Field field in item.Item.Fields)
63{
64// again, make sure the field we're looking at should be crawled
65if (field.ShouldBeIndexed())
66{
67result.Append(field.Value.StripHtml()).Append(" ");
68}
69}
70 
71// result now has all searchable text from the page
72return result.ToString();
73}
74}

 

When it comes to extensions, you’ve probably found that you have lots of similar ones. I find that I carry my extensions around from project to project. Here are some of mine that I put to good use in this case:

1public static class Extensions
2{
3// ability to check if an item or template inherits from another template
4// check https://laubplusco.net/sitecore-extensions-does-a-sitecore-item-derive-from-a-template/ for more!
5public static bool IsDerivedFrom(this Item item, ID templateId)
6{
7return TemplateManager.GetTemplate(item).IsDerivedFrom(templateId);
8}
9public static bool IsDerivedFrom(this Template template, ID templateId)
10{
11return template.ID == templateId || template.GetBaseTemplates()
12.Any(baseTemplate => IsDerivedFrom(baseTemplate, templateId));
13}
14 
15// check if rendering has a valid datasource
16public static bool HasDatasource(this Sitecore.Layouts.RenderingReference reference)
17{
18return reference.RenderingItem != null && !string.IsNullOrEmpty(reference.Settings.DataSource);
19}
20 
21// check if an item inherits from visualization exclusion base
22public static bool ShouldNotBeVisualized(this Item item)
23{
24return item.IsDerivedFrom(VisualizationExclusionBase.TemplateId);
25}
26 
27// check if a specific rendering is nested within any of the renderings passed in as
28// second argument in this case, I have dynamic placeholders set up and allow for unlimited
29// nesting of components (structurals, wrappers, etc)
30public static bool IsNestedWithinAny(this Sitecore.Layouts.RenderingReference reference,
31System.Collections.Generic.IEnumerable<RenderingReference> renderings)
32{
33foreach (var outerComponent in renderings)
34{
35var outerPlaceholder = outerComponent.Placeholder + "/";
36 
37// placeholders look like /main/left/placeholder
38// they can also be dynamic, like so: /main/left_9b5a4f2c/placeholder_8ae22dc1181
39if (reference.Placeholder.StartsWith(outerPlaceholder))
40return true;
41}
42 
43return false;
44}
45 
46private static readonly List<string> textFieldTypes = new List<string>(new[]
47{
48"Single-Line Text",
49"Rich Text",
50"Multi-Line Text",
51"text",
52"rich text",
53"html",
54"memo",
55"Word Document",
56"Raw Text"
57});
58 
59// check if:
60//  1. field can be easily crawled (e.g., is of text type)
61//  2. field is a system field (e.g., starts with __)
62//  3. field is from a base template which inherits from visualization exclusion base
63public static bool ShouldBeIndexed(this Field field)
64{
65// only allow text types to be indexed
66if (!textFieldTypes.Contains(field.Type))
67return false;
68 
69// exclude any sitecore system fields
70if (field.Name.StartsWith("__"))
71return false;
72 
73// exclude any fields from an Index Exclusion Base
74if (field.Definition.Template.BaseIDs.Contains(VisualizationExclusionBase.TemplateId))
75return false;
76 
77return true;
78}
79 
80// regex utility to strip html from a string
81public static string StripHtml(this string source)
82{
83var htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
84 
85// remove tags and convert & to &, etc
86var removedTags = htmlRegex.Replace(source, string.Empty);
87return System.Web.HttpUtility.HtmlDecode(removedTags);
88}
89}

 

At this point, I simply have to go back to any template that I want to exclude from search results and change it to inherit from Visualization Exclusion Base. We now have a very clean way to control what is and isn’t searchable in our solution.

Hope this helps!

Thoughts on “Searching Sitecore Page Datasources in Atomic Architecture”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Dylan McCurry, Solutions Architect

I am a certified Sitecore developer, code monkey, and general nerd. I hopped into the .NET space 10 years ago to work on enterprise-class applications and never looked back. I love building things—everything from from Legos to software that solves real problems. Did I mention I love video games?

More from this Author

Categories
Follow Us