I am excited to introduce the Application Indexing Protocol! AIP, for short, allows any application to proactively make itself indexable by a variety of third-party enterprise search engines. Unlike traditional connectors (created by search engine vendors or systems integration companies), application developers can now choose exactly what they want indexed and the best way to do it. Applications can choose the most appropriate indexing mechanism, either push (event-driven) or pull (polling), as both are supported by the Application Indexing Protocol. And because AIP is an open standard, applications only need to develop a single, canonical implementation in order to get their content indexed by numerous enterprise search engines. Gone are the days of every search engine vendor having to write its own version of every connector to every application. Now, the work can be done once, by the people who know the application the best.
[Chad wakes up from his dream…]
Drat! The Application Indexing Protocol does not actually exist [yet]. But I want it to so badly. I am tired of writing connectors for every new application, and for every new search engine. I am tired of trying to figure out how to efficiently and completely extract content and security trimming information from both on-premise and cloud-based applications, that may or may not provide appropriate APIs for this purpose.
I want to actually create this protocol, and I very well might do so if I can get some support. I would like to spend a few minutes describing the vision. AIP would be similar to the new app indexing protocols on iOS and Android operating systems, but across remote distances (iOS and Android provide this directly in the runtime execution or application manifests). iOS and Android now allow third-party apps to submit content to be included in the private, operating-system indexes (like Spotlight on iOS or App Indexing on Android). If you bookmark a page in a newsreader, pin a recipe on Pintrest, or you are watching an item on eBay, the applications can push this content into the operating system’s search index so you can find it later. The applications use special APIs to push content, metadata and contextual information to the operating system, as well as a way to navigate back to the app and jump straight to the specific item (see CSSearchableItem and NSUserActivity in iOS or Deep Links in Android).
AIP would allow enterprise applications to do the same thing, but over a distance, using, for example, RESTful APIs over HTTPS. To bootstrap the process, the application would call an Indexing Information Service on each search engine to securely register itself and receive some basic parameters about the search engine. The Indexing Information Service (note to self: find a better acronym than IIS) would tell the application about features that it does or does not support (like hierarchical ACLs or field-level permissions) or details about capacity, limits, refresh rates and throttling. The registration process would establish a two-way trust between the search engine and the application. As you will see in the next section, AIP integration could require that the search engine be able to remotely connect to the application, or the application might need to upload content to the search engine. Either way, this should only be done if the two ends securely trust each other.
Two Roads, One Destination
Having written countless connectors from scratch, I know that some applications provide a change-log or an incremental feed that makes it easy to pull new content, such as adds, deletes and updates. Other applications do not, and it would be much easier for those applications to push content change events as they occur. Therefore, the Application Indexing Protocol should allow both mechanisms – at the application developers choice. In push mode, the application can upload content (one at a time, or in batches) to an HTTP service on the search engine. In pull mode, the application would expose a RESTful service that the search engine would poll periodically, passing in a checkpoint token or timestamp so that only content that has changed since the last request is returned. These two mechanisms would cover virtually every situation I have encountered. Any queuing or buffering or event logs would be maintained in the application, per their own, best-choice design.
The Payload
Most search engines have a common set of information that they accept for indexing: content, metadata, ACLs, URLs, thumbnails, action (update vs. delete), etc. It should not be very difficult to establish an XML or JSON schema that universally represents a piece of content for indexing purposes. There might be a few optional or advanced features, such as hierarchical relationships or Boolean logic in ACLs, or advanced data-types that not all search engines support equally, but this can be negotiated ahead of time with the Indexing Information Service. In the polling mode, the application must provide a checkpoint token or timestamp with each response, allowing the search engine to resume on the next request. In the push mode, the application can decide whether it wants to upload payloads one at a time, or in batches, possibly using guidance provided in by the Indexing Information Service.
There Is No Step Three
I might be missing something, but at a high level this doesn’t seem onerous on either party, or complex to implement. The search engine vendors would need to provide the Indexing Information Service with a secure registration process, as well as the inbound service for receiving content in push mode. They would also need to implement a polling mechanism for applications that want to use the pull mode.
Each application would need to call the Indexing Information Service to receive important parameters and establish a security token to ensure trust between the two parties, and then implement either the push or pull protocol as they see fit. If they choose to implement the pull method, they would need to expose a RESTful service as specified by the protocol for the search engine to call. The application would need to decide what content to index and when – this could be configurable by the customer, if applicable. The application would convert content into XML or JSON payloads, including all required and optional data needed for indexing.
I believe the next step would be to create high level outlines of the required services, handshakes and schemas. Having a more formal definition will make it easier to get buy-in from search engine vendors and application developers. Creating a reference implementation would probably come shortly thereafter. If you have any questions or comments, please content me at chad.johnson@perficient.com. I would love to chat more about this idea, in hopes of it becoming a reality.