Data & Intelligence

Understanding Searchable Encryption in the Cloud

Many companies struggle with storing PII in the cloud. When you store sensitive data in the cloud, it’s critical to guarantee that it remains private. Encrypting the data before sending it to the cloud storage server is one approach to do this. This will protect your information and ensure that no one can access it. However, once data has been encrypted, searching for specific keywords typically involves decrypting, and therefore potentially exposing, sensitive data. This can lead to compliance issues, especially with PHI/PII. Searchable encryption (SE) is a well-known cryptographic primitive that provides this functionality.

Searchable Encryption

To retrieve data safely and efficiently, it is necessary to ensure that the user may search the encrypted data without revealing its contents or key terms to the server. Searchable encryption (SE) is a well-known cryptographic primitive that provides this functionality. Searchable encryption is a technique that allows authorized users to search encrypted data without decrypting it. This is done by indexing the data in such a way that it can be searched without compromising the security of the data.

Searchable encryption is useful in scenarios where the data that needs to be stored encrypted is too unstructured for standard homomorphic encryption, but there are ways to index it so that it can then be searched. Keys used to encrypt the data must be shared with authorized parties who want to search the data.

SE Parties

The data owner is the entity that creates and encrypts the data and uploads it to the cloud server. It can be a company or an individual. To use the service, the data owner uses a data processing application for uploading new material to the cloud. The data and metadata are encrypted using a complex encryption method that allows for searching by the data processing application.

The data user sends encrypted queries to the cloud service provider to seek for a specified piece of encrypted data. The system may have more than one data user, and in some cases, the data owner and data user might be the same person or entity.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

The cloud service provider provides a data storage and retrieval service. This service is made up of the cloud data server and the cloud service manager. The cloud data server is responsible for storing the outsourced encrypted data, while the cloud service manager manages this data in the cloud. The cloud service provider should not learn any information from the search operations from either the data owner or the data user

Security Requirements

Searchable encryption methods should meet most or all of the following criteria regarding the information leakage to the cloud provider as a result of searching documents:

  • Provider must not be able to learn anything about the keyword being used
  • Provider must not be able to distinguish between documents based on search
  • Provider must not be able to determine search contents from a document
  • Provider must not learn anything about the contents of a search outcome
  • Provider must not learn about the sequences and frequencies of documents accessed by a user
  • Provider must not learn whether more than one token were intended for the same query

Encryption Schemes

SE can be modeled using either using asymmetric/public keys or symmetric/secret keys.

With Asymmetric Searchable Encryption (ASE), the data owner encrypts the data using asymmetric/public key encryption schemes before outsourcing it to the cloud server. This setting is appropriate for a scenario where the user searching over the data is different from the user who generates the data. The main advantage of ASE is its functionality whereas the drawback is inefficiency. ASE allows for multiple data users, which is a common scenario. However, ASE encryption mechanisms relatively slow and costly operations.

With Symmetric Searchable Encryption, the data owner encrypts the data using symmetric/private key encryption schemes before outsourcing it to the cloud server. This setting is appropriate when the user that searches over the data is also the one who generates the data. The main advantage of SSE is the efficiency provided by the low computational overhead, but the lack of functionality of a single user scenario can be perceived as an issue.

Design Approach

Searchable encryption schemes can be built using either a non-keyword-based approach or an index/keyword-based approach. The non-keyword approach scans the whole document word by word to find a specific term. This allows you to search for any words in the document. A lengthy search time is required for a large number of documents. A keyword-based search is a good way to find things. You don’t need to search every document. Just look for the words that you are looking for and you can find them quickly. When there are lots of documents, it can be hard to do this so the index/keyword-based solution saves time. But it can require a lot of effort to maintain the index.

Common Approach

Businesses typically store a lot of semi-structured and unstructured in the cloud. Having to search every document across Amazon AWS S3 buckets is probably too time-consuming for practical consideration. An index can be created by using a commercial crawler or even a Python script. For this reason, the objections to the index/keyword-based solutions can be overcome with the goal of fundamentally improved performance. Again, with an eye towards performance, SSE outperforms ASE. Using a single service account to both publish and search sensitive data can be one alternative to overcome the utility deficit SSE has relative to ASE in this regard.

With that in mind, an architecture that uses a Symmetric Searchable Encryption, a single service account to publish and search sensitive data (possibly using a RESTful API), and an automated index-creation mechanism would seem to be the performant way to provide for searchable encryption in the cloud.

If you’re ready to move to the next level of your data-driven enterprise journey, contact Bill.Busch@perficient.com with Data Solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Callaghan, Solutions Architect

As a solutions architect with Perficient, I bring twenty years of development experience and I'm currently hands-on with Hadoop/Spark, blockchain and cloud, coding in Java, Scala and Go. I'm certified in and work extensively with Hadoop, Cassandra, Spark, AWS, MongoDB and Pentaho. Most recently, I've been bringing integrated blockchain (particularly Hyperledger and Ethereum) and big data solutions to the cloud with an emphasis on integrating Modern Data produces such as HBase, Cassandra and Neo4J as the off-blockchain repository.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram