In the Big Data Stack below, security is a vertical activity touching upon all aspects of a Big Data architecture which must receive careful attention, especially in a healthcare environment when PHI may be involved.
NoSQL technologies, which many Big Data platforms are built upon, are still maturing technologies where security is not as robust as in a relational database. Examples of these NoSQL technologies include MongoDB, HBase, Cassandra, and others. When designing Big Data architecture, it is easy to get excited about the power and flexibility that Big Data enables, but the more mundane non-functional requirements must be carefully considered. These requirements include security, data governance, data management, and metadata management. The most advanced technology might not necessarily be the best fit if some of the critical non-functional requirements can’t be accommodated without too much trouble.
To protect PHI, you might consider specialized encryption software such as IBM InfoSphere Guardium Encryption Expert or Gazzanga, which can perform encryption and decryption at the OS level as IO operations are performed. These encryption technologies operate below the data layer and as such can be used regardless of how you store the data. Using these technologies means that you can have a robust and highly secure Big Data architecture.
Figure 1 – Perficient’s Big Data Stack
I’m interested to hear your feedback and experience of security in Big Data platforms. You can reach me at pete.stiglich@perficient.com, on twitter at @pstiglich, or in the comments section below.