Sitecore uses Solr as the default search engine because it offers several advantages over other search engines. Solr is a highly scalable and efficient search platform that can handle large volumes of data and support complex search requirements. Ideally, when creating the item in Sitecore, Sitecore adds it to two indexes – sitecore_web_index and sitecore_master_index with some predefined fields. However, Sitecore does not provide any provision to remove some pages from searches like 404 pages, media files, etc.
Challenge
When we would like to exclude items from being indexed, this is difficult as it is not predefined. We want to be able to control what gets indexed on the website. This way, content authors can choose whether or not they want it to show up in search results on Sitecore.
Solution
To make this changeable in Sitecore, you need to follow a few steps.
- Create a Checkbox Field
Add a checkbox field in the page template. We can add this field to the base template shared by all “Page” items (or add it to whatever required template to exclude from indexing).
In this article, we’re considering a field as ”Exclude from Index”.
- Create a Custom Crawler
create a custom crawler that inherits Sitecore.ContentSearch.SitecoreItemCrawler and override the following methods:
using Sitecore.ContentSearch; using Sitecore.Data.Items; namespace Feature.Search.Pipelines { public class CustomItemCrawler : SitecoreItemCrawler { protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false) { var isExcluded = base.IsExcludedFromIndex(indexable, checkLocation); if (isExcluded) return true; Item obj = indexable; return obj[“Exclude from Index”] == "1"; } protected override bool IndexUpdateNeedDelete(SitecoreIndexableItem indexable) { var needDelete = base.IndexUpdateNeedDelete(indexable); if (needDelete) { return true; } Item item = indexable; return item[“Exclude from Index”]] == "1"; } } }
The IndexUpdateNeedDelete method is necessary so that existing items can be physically removed. You can only delete existing items when you execute a full rebuild of the index, without executing a full rebuild of the index, it is not possible to delete the existing items.
- Create a Patch for a custom crawler in the config
<?xml version="1.0" encoding="utf-8" ?> <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> <sitecore> <contentSearch> <configuration> <indexes> <index id="sitecore_master_index"> <locations> <crawler> <patch:attribute name="type">Feature.Search.Pipelines.CustomItemCrawler, Feature.Search</patch:attribute> </crawler> </locations> </index> <index id="sitecore_web_index"> <locations> <crawler> <patch:attribute name="type">Feature.Search.Pipelines.CustomItemCrawler, Feature.Search</patch:attribute> </crawler> </locations> </index> </indexes> </configuration> </contentSearch> </sitecore> </configuration>
- Build the solution and publish it to the server. Tick the checkbox field on item that you don’t want to include in the search
- After you finish publishing, go to the control panel and rebuild the target index, and you will be done..
Hoping this post will help you to resolve the problem statement!! Check out our Sitecore blog for more helpful tips and tricks.