According to US Department of Health & Human Services – The Privacy Rule protects all “individually identifiable health information” held or transmitted by a covered entity or its business associate, in any form or media, whether electronic, paper, or oral. This information is called “protected health information (PHI).”
Healthcare data has a great value for researchers, government and other agencies to mine and identify correlations, trends and marketing etc. for public benefit. The data is also useful for test and development environments for testing and developing applications. As well as if you are outsourcing your software development, you may have to pass the test data to your business partner.
How can we keep the privacy of the data intact and also keep it for good use?
De-identification or Anonymization is a technique used to remove protected Health Information (PHI) from the healthcare data so that it can be used for good utility. This will not only protects privacy of individual health information but also allows the data for good use.
De-identification is allowed in two ways under privacy rule:
1) Safe harbor method: In this method, 18 specific identifiers are removed from the healthcare data to make it safe, so that the data can’t be traced back to the individual.
- Names
- Geographic data
- All elements of dates
- Telephone numbers
- FAX numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers including license plates
- Device identifiers and serial numbers
- Web URLs
- Internet protocol addresses
- Biometric identifiers (i.e. retinal scan, fingerprints)
- Full face photos and comparable images
- Any unique identifying number, characteristic or code
2) Statistical method: In this method, an opinion is obtained from a qualified statistical expert on the de-identified data to verify that the risk of identifying an individual is very small (statistically insignificant)
Various techniques and algorithms are used to mask individual fields of PHI:
- Substitution: replacing with some random or realistic value in place of original
- Aging : increasing or decreasing original value
- Shuffling – moving the data across the rows
- Custom – custom method
- etc.
As number of healthcare applications increases in the Organization, the de-identification of PHI data is a big challenge. Having corporate wide de-identification vision not only protects PHI data but also provides great utility of valuable data.
In my next blog, I will discuss challenges and approaches to handle enterprise wide de-identification effort.