Search as a Service
From FAST ESP to SharePoint Search and Bing/Google, search has become an integral point for users to reference their data. Microsoft has developed Azure Search to provide an integration point for a complete search experience. Developers can use the Azure portal and front-end APIs to tune their search index as well as increase and decrease their index capacity, query count, and number of documents. These features allow for individualized, cost effective solutions for all of your search scenarios.
There are already some established solutions such as Apache Solr and Elasticsearch (which is the platform that Azure Search is based on), however ensuring cost-effective scalability and steady hosting can be a whole other task completely. Azure Search’s pricing can be scaled by your query traffic as well as you document count, which allows you to pay for what you use. In the end, all are viable search solutions for your business’ needs.
Azure Search Under the Hood
Document – A document is simply a single entity in the search index. Documents can be a webpage, a physical document like a PDF or Word document, or any custom fed content that the user can format.
Search Index – The purpose of the index is to store a collection of documents, and optimize speed and performance for retrieving documents (documents = an item in the database). The index schema is defined by the user. You can have multiple indexes in your Azure Search environment.
Query Processing – The Query processor for each search engine is tuned uniquely. It is responsible for translating the query syntax before sending it to the index for document retrieval.
Indexing Component – The indexing component is responsible for processing the data before sending the data to the search index. This is commonly known as a pipeline or enrichment step which is responsible for massaging or normalizing data.
Azure Search (vs. SharePoint Search)
The core of Azure Search is built off of Elasticsearch, but don’t be fooled, this is not just Elasticsearch hosted on Azure. Microsoft has provided their own API on top of it, which makes interfacing with the engine more familiar to Microsoft developers and front-end developers.
If you were a PowerShell heavy SharePoint developer who used PowerShell scripting for configuration, then Azure Search’s APIs should feel familiar. However, Azure Search does not include a crawler, which SharePoint users have become accustomed to. Developers are responsible for formatting and feeding the data to the document processor/indexer. While this might seem like a huge oversight, content feeders or crawlers are generally not included in stand-alone search engines like Apache Solr.
Why Azure Search?
Search infrastructure has always been both taxing to maintain and often not cost efficient to scale. Azure search infrastructure is fully managed in the cloud by Microsoft, leaving you bandwidth to build you application. As your search application grows and requires more bandwidth, you can comfortably move up the pricing tier, or move down tiers during the off-season.
The management of the data has all been moved to the front end, exposed through a JSON schema API. There are no predefined index schemas or a crawler to fetch data. This is a “push-based” indexing system, which is common for engines built off Lucene. This allows you to easily separate your data pulling and pushing onto other servers and place less stress on your actual search servers. This also gives you full control on when content is pushed and what content is pushed.
So What’s Next?
Go try it out! http://azure.microsoft.com/en-us/services/search/
If you have feedback for Microsoft or would like to see what others are saying, please visit: http://feedback.azure.com/forums/263029-azure-search