Sitecore Search is a robust search solution designed to streamline the indexing and retrieval of content with ease. Supporting a wide range of source types empowers developers to integrate various content repositories without breaking a sweat. In this blog, we’ll take a deep dive into the different Sitecore Search source types, complete with implementation examples, to help you hit the ground running—and maybe even have a little fun along the way! Because let’s face it, even search solutions can be exciting when you know what you’re doing. Ready? Let’s search for success!
Sitecore Search supports multiple content sources, including web crawlers, API-based sources, Sitecore Content (XM/XP), database sources, and file-based sources.
Web Crawler & Web Crawler (Advanced)
Sitecore Search web crawlers index external websites such as marketing pages, blogs, or help documentation. They can extract content, metadata, titles, and links to unify search across sources. The crawlers support pagination, respect robots.txt, and can follow links, including PDFs. They work with public-facing sites or gated content, depending on authentication support. The basic crawler is best for static HTML, while the advanced crawler adds support for dynamic content and API-based sources.
The basic web crawler is suitable for crawling simple blogs or marketing pages, extracting standard elements like title, body, and metadata, and handling basic pagination. It can also use sitemaps or simple URL filters and supports basic authentication for gated content. However, for more complex scenarios, an advanced crawler is required. It supports authenticated content using tokens or custom headers, can extract and process PDF links, and handles DOM-based or multi-template extraction. The advanced crawler also works well for indexing multilingual websites, crawling structured content like tables or schema.org metadata, and accessing dynamic or JavaScript-heavy sites by targeting API endpoints.
API Crawler
An organization has product data stored in a headless CMS or a custom e-commerce platform. Each product is available through a RESTful API endpoint using a query like:
query { products { id name description price image { url altText } }
This query retrieves structured product data and media information (image URL and alt text), which can be mapped to Sitecore Search index fields for display in search results or personalized experiences.
The goal is to make this content searchable in Sitecore Search with structured metadata (name, description, price, categories, images).
The API crawler is ideal when data isn’t available as public HTML pages or when there’s a need for complete control over indexing. It sends GET requests to the API, parses the JSON response, and maps the data to Sitecore Search index fields. It supports pagination, token-based authentication, and custom headers, making it perfect for secure or complex integrations. You can filter, transform, or enrich data before indexing, which is especially useful for frequently updated sources like product catalogs or content managed in headless CMS platforms.
What to Keep in Mind
When implementing Sitecore Search, it’s crucial to consider factors like content freshness (no one likes outdated results), indexing frequency (because a once-a-year refresh isn’t cutting it), and data structure (keep it clean or risk a search disaster). If you’re working with JavaScript-heavy websites, be prepared—web crawlers might get overwhelmed, so some extra configuration might be required. For API-based sources, make sure you handle rate limits and authentication properly, or you’ll be stuck waiting for permission to proceed. When indexing Sitecore CMS content, remember to factor in versioning and workflow states—after all, only the published content should make it to the index. With a little attention to detail, your search results will be top-notch, and everyone will think you’re a Sitecore Search wizard!
Sitecore Search provides a range of flexible source types to meet all your indexing needs, ensuring that businesses can deliver a seamless and efficient search experience. Whether it’s website content, structured data, or document-based information, Sitecore Search has the tools to make everything searchable and accessible—like a super-powered search engine, but without the superhero cape (though we’re sure it’d look good).In my next blog, we’ll explore more Sitecore Search source types and their unique use cases. It will be a journey, and no, you won’t need a compass—just a good internet connection and maybe a cup of coffee! Stay tuned for more! For a comprehensive overview of Sitecore Search, including crawlers, extractors, and widgets, please refer to my earlier blog post: Making Sense of Sitecore Search: Crawlers, Extractors, and Widgets.