In the past, the pursuit of privacy was an absolute, all-or-nothing game. The best way to protect our data was to lock it up with an impregnable algorithm like AES behind rock-solid firewalls guarded with redundant n-factor authentication.
Lately, some are embracing the opposite approach by letting the data go free but only after it’s been altered or “fuzzed” by adding a carefully curated amount of randomness. These algorithms, which are sometimes called “differential privacy,” depend on adding enough confusion to make it impossible or at least unlikely that a snoop will be able to pluck an individual’s personal records from a noisy sea of data.
The strategy is motivated by the reality that data locked away in a mathematical safe can’t be used for scientific research, aggregated for statistical analysis or analyzed to train machine learning algorithms. A good differential privacy algorithm can open the possibility of all these tasks and more. It makes sharing simpler and safer (at least until good, efficient homomorphic algorithms appear).
Protecting information by mixing in fake entries or fudging the data has a long tradition. Map makers, for instance, added “paper towns” and “trap streets”, to catch plagiarists. The area formally called “differential privacy” began in 2006 with a paper by Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith that offered a much more rigorous approach to folding in the inaccuracies.
One of the simplest algorithms from differential privacy’s quiver can be used to figure out how many people might answer “yes” or “no” on a question without tracking each person’s preference. Instead of blithely reporting the truth, each person flips two coins. If the first coin is heads, the person answers honestly. If the first coin is tails, though, the person looks at the second coin and answers “yes” if it’s heads or “no” if it’s tails. Some call approaches like this “randomized revelation.”