Experience Management

Is your Google Search Appliance platform secure?

If you have read Google’s product literature, you know that the Google Search Appliance is a very secure device.  The bright yellow appliance runs a hardened version of CentOS, and the inner-workings are safely hidden behind root login.
u-s-_bullion_depositorySo, assuming we are dealing with an appliance with Fort Knox-level protection, what risks remain?  Below are several potential vulnerabilities that could jeopardize the security of your GSA platform.  Some of these risks can be mitigated easily, while others (particularly those involving human beings) may never be 100% avoidable.  I am not trying to cause panic.  I only hope to better educate the community so that simple risks can be avoided, and more complex risks can be appropriately understood and mitigated.
Administrator access
The GSA admin console offers two levels of accounts, Administrator and Manager.  In a perfect world, we would issue most accounts at the Manager-level.  But in practice, Manager-level accounts are not typically powerful enough to satisfy the needs of most users.  Manager accounts cannot adjust crawl URL patterns, manage query expansion files, or even set up dynamic navigation facets.  Because of this, I find that a most accounts in the GSA admin console are created at the Administrator level.
Administrator-level accounts expose your GSA to both intentional and unintentional risks.  I am going to discuss both, but understand that dealing with intentional malice is somewhat beyond the scope of this article.  In general, most of these risks can be mitigated with better awareness and education.  A little training and planning can go along way towards avoiding any accidental mishaps.
Accounts with Administrator access can compromise the GSA platform in many ways.  Here are several examples, grouped by either exposure of secure data or denial of service:

  • Exposure of secure data
    • Incorrectly setting the “Is Public” flag for a set of content on the Crawler Access or Forms Authentication pages, allowing any user to search for and view secure items
    • Incorrectly applying a Policy ACL that takes precedent over Per URL ACLs, allowing the wrong users to see secure items
    • Incorrectly setting up collections.  If collections are used as a way of segmenting or siloing search to different audiences, this could cause results for one audience to be delivered to a different audience.
    • Viewing sensitive metadata for secure content in the Index Diagnostics screen
    • Enabling and downloading packet capture files while indexing is underway.  XML Feeds or securely crawled web pages could be exposed.
    • Downloading recent XML Feeds and viewing the metadata or content files
    • Uploading a malicious SSL certificate or Certificate Authority and then being able to promiscuously observe secure search traffic
  • Denial of service
    • Powering off the appliance
    • Resetting the index
    • Removing allowed URL patterns
    • Deleting XML Feeds
    • Pausing the crawler or disable connector traversal
    • Editing or removing Universal Login rules

Most of these items require a high level of skill and knowledge that prevents them from being casually exploited.  But others are dangerously easy.  Imagine setting up a Crawler Access rule for a site to re-mark any PDF’s as Public instead of Secure.
Intended URL Pattern:

regexp:my.site.com/*.pdf

Accidental URL Pattern:

regexp:my.site.com|*.pdf

One wrong character has now marked every page on my.site.com and every PDF in the entire GSA as public, instead of just PDF’s on that certain site.  Oops.
Connector Manager server access
All documents and metadata travel through the Connector Manager in relatively clear form.  Metadata is often written to local log files, and file content can be intercepted by attaching a JVM debugger to the Connector Manager during operation.  Service accounts and passwords can also be intercepted with a JVM debugger if you know where to look.  Consider physical access to the Connector Manager just as dangerous as physical access to the content repositories themselves.
Network Traffic
CapturaWiresharkNetwork traffic between the Connector Manager and GSA can be protected with SSL, but is not often done so in practice.  If HTTP traffic is used between the Connector Manager and GSA, all XML Feeds can be intercepted (the XML Feeds can contain metadata and content that can easily be decoded back into full files).  Likewise, services accounts and passwords could be intercepted when the connector settings are modified.  We recommend enabling and using SSL traffic from the GSA to the Connector Manager, and from the Connector Manager back to the GSA.
If SSL is not enabled on your user interface, it is also be possible to eavesdrop the traffic of users running secure searches.  While the snooper might not see the entire documents, titles and snippets can contain sensitive information.  Cache views and document previous could expose even more.  We always advise using SSL with a valid certificate when serving secure search results.
Conclusion
While the GSA is a hardened, secure appliance, many accidental, or intentional, vulnerabilities are possible.  The examples above are by no means comprehensive.  I’m sure others exist.  Test your system regularly to ensure that unsecured users cannot find secure content.  A simple monitoring script could periodically run public searches for known, secure content and trigger an alarm if they are found.  Network diagnostics can be run before a system is launched to ensure that all traffic is SSL protected and any unused ports or services are disabled, particularly on connector and application servers.  Physical access and accounts should be audited periodically.
Following a few simple precautions can lead to a much safer GSA platform.

About the Author

Chad is a Principal of Search and Knowledge Discovery at Perficient. He was previously the Director of Perficient's national Google for Work practice.

More from this Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up