In December 2019, popular document database MongoDB added a fairly radical new feature to the platform: field-level database encryption. At first glance, one might wonder whether this is a meaningful feature in a world that already has at-rest storage encryption and in-flight transport encryption—but after a little closer analysis, the answer is a resounding yes.
One of MongoDB’s first customers to use the new technology is Apervita, a vendor that handles confidential data for well over 2,000 hospitals and nearly 2 million individual patients. Apervita worked side by side with MongoDB during development and refinement of the technology.
Since reaching general availability in December, the technology has also been adopted by several government agencies and Fortune 50 companies, including some of the largest pharmacies and insurance providers.
Field-level encryption in a nutshell
MongoDB’s field-level encryption (FLE) offers the ability to store certain parts of the data in its document store encrypted. The community (free) version of MongoDB allows for explicit encryption of fields in client-side applications.
Enterprise versions of MongoDB—and Mongo’s cloud-based Database-as-a-Service, Atlas—also support automatic encryption. MongoDB Enterprise and Atlas can also enforce encryption on protected fields at server-side, preventing a terminally clueless application developer from accidentally storing sensitive data in clear text. Encrypted fields can be automatically decrypted upon read—presuming the application has the key—in either free or enterprise versions.
Setting up an automatically encrypted database is a little too chewy to poke through in code here. But to understand how and when the encryption occurs, it may help to take a quick look at the Python code to do a single, explicitly encrypted MongoDB insertion:
# Explicitly encrypt a field: encrypted_field = client_encryption.encrypt( "123456789", Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic, key_id=data_key_id) coll.insert_one("encryptedField": encrypted_field)
The explicit call here makes it pretty clear what’s going on: the data is encrypted on the client application side, then sent to and stored by the MongoDB server instance. This obviously gives us most of the benefit of both in-flight and at-rest encryption, but there’s another layer of defense offered here that might not be as immediately obvious.
A closer look at the sysadmin problem
System administrators—and database administrators—represent one of the thorniest problems of data confidentiality. A computer needs a human operator with all the privileges necessary to start, stop, maintain, and monitor services; this entails the sysadmin effectively having access to any data either stored on or processed by that system.
Similarly, databases—particularly large-scale databases—must have database administrators. The DBA may not have the low-level root access to a system that a sysadmin would, but it has access to the inner workings of the database itself. In addition to designing the initial structure of the database, a DBA must be able to log and monitor the running database engine, to identify “hot spots” in the data.
Those hot spots might call for restructuring or indexing to alleviate performance problems as they arise. Troubleshooting them properly will also frequently mean the need for a DBA to be able to replay troublesome queries, to see if the DBA’s changes have made a positive or negative impact on performance.
At-rest encryption does very little to solve either the sysadmin problem or the DBA problem. Although sysadmins can’t get meaningful data by cloning the raw disks of the system, they can easily copy the unencrypted data from the running system once its storage has been unlocked.
If the storage encryption key is present in hardware—for example, built into a Trusted Platform Module (TPM)—it does little or nothing to mitigate the sysadmin problem, since the sysadmin has access to the running system. As Apervita CTO Michael Oltman told us, “[we’re] not worried about someone walking out of an AWS data center with our server.”
An at-rest encryption system that requires a remote operator to unlock storage with a key provided at boot mitigates this problem somewhat. But a local system administrator will likely still have opportunities to compromise the running machine—and availability may be impacted, since unavailability of the remote key operator means services won’t come back up automatically after a maintenance window involving a reboot.
This inability to secure private data from system and database administrators makes it more difficult and expensive to scale a large operation without potentially breaching confidentiality.
Field-level encryption enables scale by segmenting access
Now that we understand the sysadmin problem, we can look at how field-level encryption mitigates it. With FLE, the application encrypts data before ever sending it to the database—and the database stores it exactly as-is. Similarly, when encrypted data is queried, it’s retrieved and sent back to the application still encrypted—decryption never happens at the server level, and in fact, the server doesn’t have access to the keys necessary to decrypt it.
With data securely encrypted before ever hitting the database—and never being decrypted until it comes back from the database—the sysadmin problem is largely solved, whether discussing sysadmins or DBAs. A system administrator with local root access can stop, start, and upgrade services without ever getting access to the data—and a database administrator can view and replay running queries without seeing the private contents, either.
To be fair, we’ve only kicked this particular can a little further down the road. Sysadmins and developers with access to the production application server can still see data they shouldn’t—the application itself must handle the raw data, after all.
The segmentation is still meaningful, however, since it enables the use of automatically provisioned and third-party-monitored services like MongoDB’s Atlas. Without field-level encryption, HIPAA would have a field day with any vendor who tried to store protected health information in a third-party-managed cloud service.
With FLE, however, the database side of the application can be considered non-confidential. This in turn enables the vendor who is responsible for the data to leverage the concentrated, high-level expertise of a database as a service provider. The vendor also reduces the scope of systems and equipment subject to expensive HIPAA (or other regulatory statutes) physical and network security rules.
Design goals and methods
There are at least two questions that should always be asked about encryption—how much does it impact performance, and more importantly, is it really secure? MongoDB’s performance goal for FLE was a latency impact of 10 percent or less. In internal testing using standard industry database benchmarks, net impact on high-volume, read-intensive workloads was five to 10 percent.
Equally important, applications that didn’t use encrypted fields didn’t take a hit. Applications that only encrypt sensitive data—for example, encrypting Social Security numbers while leaving names in cleartext—in turn see less impact than those that encrypt entire documents as a whole.
When we interviewed MongoDB’s Kenn White, he also stressed that the crypto itself wasn’t something just cooked up on the fly and in-house. The company hired several teams of well-respected cryptography experts, drawn from academic and industry backgrounds. It also commissioned a third-party audit of encryption and application security from the well-known security firm Teserakt, which received attention recently for its own ambitious E4 protocol, designed to provide in-flight encryption to embedded devices.
Beyond getting the crypto and the performance right, one of the most important goals MongoDB had for FLE was to make certain everyone could use it, with minimal barriers to adoption. This meant designing custom APIs for seven of the most-popular application-development platforms used with MongoDB—including Node.js, Python, Java, .NET, and Go.
Although we focused heavily on MongoDB here, it’s not the only—or even the first—database technology providing FLE. A competing NoSQL database platform named Couchbase implemented FLE a year earlier, and Amazon introduced FLE in its CloudFront DBaaS in late 2017.
Just as salted one-way hashing rapidly became the mandatory standard for password storage, we expect that field-level encryption will become a mandatory feature for databases that handle sensitive or confidential information and for the same reasons—protecting it not only from outside attackers but from legitimate system and infrastructure administrators as well.