Traditionally, enterprise search engines have been firmly planted on the ground. Now, along with almost everything else in enterprise IT, search engines are quickly moving into the sky. But why? What factors are prompting the growth of cloud-based search engines? Given the immense IT security (and mindset) challenges associated with indexing sensitive content in the cloud, why are vendors continuing to march full-speed in this direction?
1. Follow the Content
Almost all of our enterprise search clients are indexing at least one cloud-based content repository, such as Box, Salesforce, Google Drive, Office 365, etc. This historically involved writing connectors that download all of the information back to an on-premise search engine. As more content sources move to the cloud, it is becoming increasingly inefficient (and absurd?) to download all of the content back to the ground so it can be indexed.
With cloud-based search engines, the idea of cloud-to-cloud indexing is introduced. Typically, some form of connector or adapter is still required (see my recent post for a suggestion to make this easier), but as long as the connectors are also deployed in the cloud, it is an architectural improvement. Some vendors are even considering internal peering arrangements that reduces the cost of intra-cloud data movement, reducing the cost to index first-party content in the cloud, since the bits are all flying around within the same cloud.
2. We Need More Power!
Search applications increasingly demand advanced features like text analytics, language translation, optical character recognition (OCR), document previews, and machine learning ranking algorithms, not to mention increased indexing and serving capacity for an ever-increasing amount of content and number of users. On-premise search engines often struggle to keep up with the demand for these features, particularly on larger repositories. Physical limitations of CPU and RAM and storage are a cat and mouse game with traditional on-premise search engines. The competition between indexing-time resources and serving-time resources is always a delicate balance.
Cloud-based search engines have fewer limits when it comes to computational power and resources. Yes, there are still ultimately limits, but they can be mitigated more easily in the cloud, and typically at a lower financial cost. OCR and language translation can be spun off to queues and processed by farms of worker nodes. Document previews can be stored in vast cloud storage buckets. Machine learning models can consume enormous amounts of RAM without drawing attention.
3. Hide the Complexity
Deploying a load-balanced, fault-tolerant, performant search engine on-premise typically requires deploying multiple, replicated copies of the search index and all associated components. The Google Search Appliance, for example, was a fast machine, but it still required multiple nodes (at increased cost) to handle extreme indexing or serving loads. Autonomy IDOL and Microsoft FAST architectures for large environments often looked like the ironwork of the Eiffel Tower with all the interconnected layers and nodes.
Properly designed cloud-based search engines can abstract that complexity for the customer. Search vendors can architect the indexing and serving layers to scale dynamically to meet variable demand. Products like SolrCloud, Elasticsearch and SearchBlox make it easy to spread load across multiple nodes, but they still require the customer to physically manage their own instances and capacity. True cloud-based search engines, like Amazon CloudSearch, Azure Search, Coveo Cloud, and Swiftype, are taking this concept to the next level, offering search-as-a-service that eliminates the customers need to worry about physical infrastructure or scale.
The advantages of moving search engines to the cloud are real. The challenges are real, too. Next week I plan to address some of the criticisms of cloud-based search engines and the possible mitigations.