Most organizations understand that they need to protect their data for their own operations, but the proliferation of data protection laws such as the European Union’s General Data Protection Regulation and the California Consumer Privacy Act are forcing data keepers to think about the privacy of the those whose data they’re holding. That’s why organizations—even those without much of a concern about bad actors—need to formulate a legitimate data protection strategy that protects privacy.
Data access controls must not only govern users, but take into account the rights of the creators of data. The data has to be encrypted or tokenized and only people with a business need to see the data be given access to it.
Most organizations aren’t implementing that kind of strategy. Some organizations are doing dynamic data masking to mask information from non-privileged users, but that leaves the data in the clear at the back end. Others are resorting to whole-disk or file system level encryption. That approach has merit, but it’s woefully inadequate compared to the level of protection that can be provided by persistent, data-centric protection.
When formulating a data protection strategy that preserves privacy, it’s important to identify all your operational databases that could present serious problems to your organization if they were breached. You also need to identify the mechanisms used to protect your data wherever it resides. That’s typically done with data discovery tools that can work on both structured and unstructured data.
Get the job done with the right tools
Two keys to formulating a data protection strategy that protects privacy are tokenization and format-preserving encryption (FPE), which preserves the formats of structured data items after they’ve been encrypted. Without preserving the data’s formatting, application or schema changes might have to be made in a database to accommodate encryption, which can change the length of a data item.
With format-preserving encryption, analytics can be performed on a database without exposing sensitive data fields. For example, you could protect a customer’s first and last name, Social Security and credit card numbers with encryption, but expose data like products bought, total spend, and such. So you’re protecting the customer’s privacy, but can still perform valuable operations on your database and in your cloud analytics.
Format-preserving encryption can be used in any environment, such as Snowflake and BigQuery, and a similar approach can be applied to unstructured data. In that case, after the unstructured data is encrypted, a policy can be inserted into the file’s metadata that identifies who should have access to it. The policy remains with the file during its life span so if someone without proper permission tries to access a copy of the file, they won’t be able to do it.
For the policy approach to work, however, software agents need to be installed on any machines or devices that will be working with the files. An alternative is to include code within applications that can perform the functions of the agent. That approach has the advantage of enabling an organization to better understand where data exposure is happening.
Identifying anomalous behavior is key
When done properly, the code within the application can be used to identify anomalous behavior by the software in almost real time. For example, an app that typically does 500 decryptions a day that suddenly starts doing 5,000 decryptions in a day would be a tip-off that an attack may be underway. Maybe those additional decryptions are being caused by a spike in call center activity. On the other hand, if call center activity isn’t unusual, then something is seriously wrong.
Tokenization allows dummy data to be substituted for real data. It, too, preserves formatting because each token replaces a corresponding piece of data. So a tokenized credit card number, for example, will look and smell like a credit card number but it couldn’t be used as a credit card number since it isn’t one. Tokenization is typically used only with structured data.
Real data can also be used in combination with tokens. So you could the expose the first four digits in a credit card number—which identifies the issuing bank—and tokenize the rest of it. That would be useful in determining how many transactions came from a card issued by a certain bank or authority.
One advantage of having a data protection strategy that protects privacy is that it’s likely to remain relevant in the face of future changes in laws, regulations, and compliance standards. A common denominator of all those things is a requirement that the latest security mechanisms be used to protect data. Tokenization and encryption are such mechanisms. They ensure that if data is exfiltrated from an organization, its value will be worthless to information thieves. That’s why laws and regulations don’t consider theft of tokenized or encrypted data a data breach.
Avoid exponential exposure
Without a data protection strategy that preserves privacy, an organization is engaging in risky behavior. Privacy exposure goes up exponentially. For example, unauthorized people might access your HR data, which would immediately create a privacy crisis where any number laws or regulations could be triggered and fines imposed. In recent times, such fines have been slapped on a variety of organizations, including Google ($43 million), Capital One ($80 million), Yahoo ($85 million), Uber ($148 million), and British Airways ($230 million).
In order to avoid the kinds of confrontations with regulators that can lead to those severe penalties, organizations should start implementing a data protection strategy that guards the information privacy of those whose data is in their care sooner rather than later.
Reiner Kappenberger is Director, Voltage Data Security at CyberRes.