Proper Search Index implementation is an essential part of development for the Sitecore platform. When you begin to work with indexes, you have to decide what index to use in your project and how store data in it. This article intends to show you possible options and give you tools to make educated decisions.
The Basics: Sitecore Indexes
Let’s start from the basics. Sitecore provides three predefined indexes that most developers know about (and, eleven more indexes were added in Sitecore 8+ but at this point that’s not important!):
- sitecore_core_index
- sitecore_master_index
- sitecore_web_index
These three indexes contain all the versions for all the items from the corresponding Sitecore database. Sitecore uses these indexes for Item Buckets, Search based Fields like “Multilist with Search”, media items search, etc.
Now, after we consider how Sitecore uses its indexes, it’s time to decide how to store the custom data that we need. Let’s talk about typical scenarios, which could include site content search, Product Catalog, Blog, Locations, Employee Directory, etc. All of these scenarios have commonalities: They all use some subset of Sitecore items, those items share the same Template (sometimes a few templates) and they represent one business area of the site. In software engineering we call that the “Business Domain.”
So, what are our options (patterns) to store the data?
God Index
A God Index is an approach where all the data is stored in existing sitecore_DbName_index index. That index is available out of the box and seems to be a good candidate for storing our data. Typically, a developer only needs to add computed fields for custom data to make the index work.
Despite its simplicity, the approach I’ve described has several disadvantages. If you are familiar with the “God Object” pattern you can probably guess what one of those could be: The index has too many responsibilities.
Below are issues we have seen in sites implementing that pattern:
- The index keeps too much data. Data for every version, in every language, for every Sitecore item is stored in that index. Moreover, every document includes all possible fields. As a result, index update/rebuild is slow and rebuild causes significant downtime because of the index’s size.
- Data for multiple tenants is stored in one index. You cannot maintain your sites independently because of shared storage. A configuration or data model update that you make for one site can break other tenants.
- Queries are overcomplicated. We need to add extra filters to exclude data that isn’t relevant for our business logic.
- Every document contains lots of extra fields. That affects performance and reduces data readability and the ability to debug.
- All the data shares the same update strategy. You cannot update specific entities like products, blogs, pages and articles on individual schedules.
- Multi-region implementations use the index from different time zones. This limits the time when index maintenance can be performed.
- Maintenance collapse. Very soon you can arrive at a state where nobody knows how exactly the index is used. In such situations, you can’t refactor the index. Instead, you have to freeze the existing data model and flow.
All of these issues significantly complicate index maintenance. That’s the price we have to pay for simplified configuration. But, there is an alternative approach–which we’ll call “Domain Index”–that addresses the issues above.
Domain Index
With a Domain Index we are not trying to put all our business data into the existing Sitecore_DbName_index. Instead, we identify our business domains and configure a new index for every domain. That includes building individual indexes per tenant for multi-tenant implementations, and creating separate indexes for master and web databases.
The Upside
A Domain Index needs to include only the data (documents and fields) that is relevant for our current business area. Data should be stored in the format that works best for us. This is how index names can look in a product catalog:
- myTenant_products_master_index
- myTenant_products_web_index
If you want to learn the advantages of Domain Index, please scroll back up to the section outlining the issues presented by the God Index. The Domain Index addresses all of them!
The Downside
The one negative to a Domain Index is more complicated setup. It’s key to configure the indexes to exclude data “noise.” The configuration below is applicable for Sitecore 8.1. The same result can be achieved in previous Sitecore versions, but the paths may be different.
- Filter Items by Root Item – sitecore/contentSearch/configuration/indexes/index/locations/crawler/Root
- Filter items by Template Type – sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/documentOptions/include
- Remove standard fields by turning off indexAllFields in sitecore/contentSearch/indexConfigurations/yourIndexConfiguration or exclude fields one by one using sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/DocumentOptions/exclude hint=”list:AddExcludedField” section
- Add computed fields into sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/documentOptions/include section
- Configure update strategy (you can read more about that in this John West post)
The Sitecore Perspective
Some of may be asking yourselves: That’s all great, but doesn’t using a Domain Index conflict with Sitecore’s strategy? Does Sitecore want us to use that pattern?
The answer is – Yes! Sitecore is actually using that pattern by itself.
Now it is time to look at these new 11 Indexes assigned to Sitecore 8. There are some names:
- sitecore_marketing_asset_index_master
- sitecore_marketing_asset_index_web
- sitecore_fxm_master_index
- sitecore_fxm_web_index
That index name schema should look familiar.
And now, if you take a look at three default Sitecore_DbName_index indexes from a domains perspective, you will immediately realize that they have also implemented a Domain Index pattern. Their domain is “Content Authoring” and these indexes are well designed for that domain. So how can a Domain Index become a God Index? It is because of you and me–developers–who start using indexes for the wrong purpose.
That reminds me of Schrödinger’s cat. Sitecore indexes implement a Content Authoring Domain search out of the box. But the moment developers query them for the purposes of another business area, something strange happens. The indexes cease to be pure Domain Indexes. They take on something from dark side of the God Index anti-pattern.
Happy Searching. May the Force be with you.
Hi,
may i know if we can use Solr configured in sitecore to index websites which are not published by sitecore? The thing is we have multiple websites (under different sub-domain) which are built based on different technologies and platform. The main site is built on sitecore and we would like to setup solr in sitecore such that it can index and provide search services for all websites.
Thanks