On Equifax, and the Futility of Authentication via Social Security Numbers

Equifax, one of the three major credit bureaus of the United States, recently announced their systems had been breached in a colossal way. The net result is the Social Security Numbers (SSN) for 143 million U.S. citizens have been stolen.

Until we know the exact details of the breach, it is a reasonable assumption that if you have an established credit history, anyone who wants to obtain your SSN and is willing to pay for it, can now do so.

This breach was possible due to a failure by Equifax to patch a well known security vulnerability in the software stack used by some of their systems. You can assign well-deserved blame all day, but it was only a matter of time (not a matter of if) before something of this scale occurred. If not by Equifax, then by someone else; there are so many copies of your SSN floating around in various lists used by banks, credit card companies, loan providers, state and local governments, payroll processing companies, tax return services, and so on, that it is truly baffling that we continue to use the SSN as a form of authentication.

In other words, to this day, having knowledge of one’s SSN (and a few other semi-permanent details, most of which were also stolen) is sufficient to prove to a credit provider that you are you!

A while back I wrote a post about implementing a Federal ID system that includes a randomly generated public key as part of the necessary information needed to prove your identity. At the time, much of the focus was on creating a strong system to prevent voter fraud, but that ID system would also make a breach like Equifax’s no more than a minor hassle for most people, rather than the permanent risk of their identity being stolen and wrecked that it is now.

Such a system would not have to be implemented exactly as I described, but if we want to mitigate the effects of the Equifax breach, identity authentication systems in the future must require two types of information:

  1. Static information, such as your SSN and physical address
  2. Ephemeral information, such as a randomly generated number/key/PIN that can be replaced at any time by the user, but requires physical proof of the user’s identity in order to do so

The ephemeral information should be centrally managed (whether public or private is another discussion) and implemented as part of a system that serves no other function than being a broker between you and another party attempting to verify your identity.

We can’t continue to do important security with static information alone!

Why not a Federal ID?

We should have a federally funded, nationwide ID program. Here’s why and how, briefly.

The Case “for”

A federal ID could be used to help ensure the sanctity of the democratic process. With it, one could prove their identity when voting to prevent fraud and hopefully alleviate suspicion of it. Note, there are studies that indicate ordinary voter ID does not increase one’s confidence in the system, even though 80% of our country is in favor of voter ID. However, I am not describing an ordinary voter ID.

If implemented as described below, a federal ID could reasonably prove one’s identity whether in-person, by mail or even online. It could also provide forensically sound evidence in the event that a vote is disputed in any region – each vote would be tied to a unique federal ID and digitally signed with their private key. By verifying a small, random sample of votes one could readily determine that no ballot stuffing, fake registrations, or hacking occurred. After a period of time, the link between ID and vote could be destroyed to retain anonymity (something not guaranteed even now by absentee ballots).

If implemented as described, such a federal ID could mitigate the risk and damage of identity theft. A simple trip to the local card office could permanently lock out the stolen credentials while allowing the victim to continue utilizing their federal ID number. Any further attempts to utilize the stolen credentials will fail yet still be useful for law enforcement purposes.

A federal ID could also be a convenience by consolidating common information. It could be used to provide proof of eligibility for employment. Employers could scan the card and get instant feedback in lieu of the existing I-9 forms. It could show you have a license to drive or carry a concealed firearm in a given state, or even have a professional license. Each state could petition the federal government for certain features they’d like to be readily shown on the ID.

The Case “against”

Cost. There is no way to get around the issue of cost. We simply must decide that the cost of this program is worth it or necessary. However, I am advocating for this ID to be federally funded (including a certain number of re-issuances for loss/theft), thus not an instrument of systemic bias against lower income individuals. To further facilitate that income neutrality, we could fund local employees who will meet lower income individuals at work or home for the purpose of identity verification and issuance, to prevent acquiring or updating the ID from being a source of financial hardship.

Privacy. This is a valid concern with any national program that tracks citizens. However, we are already tracked by multiple federal agencies. There is no reason why a federal ID has to further reduce our privacy, especially if we apply the following stipulations:

  1. The photograph and any other identifying information appearing on the ID card may NOT be entered into any searchable database other than the federal ID program.
  2. Require a warrant to obtain a name from the database by submitting a photograph. All facial recognition in the service of a warrant must be done by the federal ID system ONLY i.e., law enforcement would NOT have access to the entire database to perform their own searches.
  3. Similar to #2, if DNA or other biometric information is incorporated into the system to provide further verification of a person’s identity, searches against that information may only be performed with a warrant, or by card issuing offices when printing a new card and the individual’s identity cannot be verified by visual inspection.

Solution in Search of a Problem. This argument is frequently made by advocates against voter ID such as the ACLU. For instance, they claim there is no evidence of in-person voter fraud being even a minor problem. Even if that is entirely true, voter fraud is a frequently discussed national concern. As mentioned above, a vast majority of our nation is in favor of some form of voter ID, even though they may not realize that ordinary voter ID does not solve many of the problems they are worried about.

I emphatically dispute, however, the ACLU and other’s claim that a federal voter ID could not prevent voter fraud by mail. Digital signatures (and a pseudo form of them for ballot by mail) are a proven technology that can be used to provide a reasonable system of validation of every mailed or digitally submitted ballot, especially if implemented as described below.

Implementation Details

  1. The ID card should prominently display a recent photograph and the birth month and year of the individual for casual in-person verification of their identity (such as when voting at a polling center or being carded for age).
  2. The ID should have a unique, randomly generated 10 digit alphanumeric ID number, bound to the individual. That will be the individual’s Federal ID for life. It should be closely held like one’s social security number (and indeed could one day replace the SSN), but the cryptographic system described below should allow for recovery of one’s identity after ID theft without having to reissue a new number.
  3. In addition to the Federal ID, a randomly generated public/private key pair for that individual. The public key should be converted to an alphanumeric form and printed ONLY on the back of the card. This area of the card should be laminated as described below. Upon printing the card, the public key should be digitally scrubbed from the system and available nowhere else.
  4. The private key should be stored in a centralized, air-gapped system with only two remote capabilities: 1) Cryptographic checksum/signature verification (keyed by Federal ID and a digital signature signed by an individual’s public key) 2) Facial recognition/biometric search by warrant only.
  5. I’d suggest using elliptic curve cryptography simply to keep the keys of manageable length for a human to type out when doing remote identity verification.
  6. A new public/private key pair should be generated whenever the card is re-printed (e.g. from loss, theft, information change or even upon request for a fee), and also periodically regardless – perhaps every two to three years. Doing so gives the federal ID a greater resistance to identity theft than the simple social security number + birth day system of authentication we use now. Even if someone’s federal ID becomes public information, it still cannot be used to vote or open a line of credit, for instance.
  7. The ID should be laminated before being issued, and the lamination on the back should be lenticular over the area of the public key to prevent the key from being read or copied at anything but several sharp angles along the full length of the key. Although this feature serves to prevent casual identity theft from a simple photo or copy of the card, it is not failure proof nor intended to be. One should still be mindful of who handles their ID card.
  8. The back of the card should VERY prominently indicate that the number on the back should not be duplicated and only entered when performing a remote identity verification (such as when opening a new line of credit, submitting one’s tax returns or even when signing a mail-in ballot).
  9. I’d suggest collecting an individual’s DNA or other biometrics at birth (and when this system is first implemented) so that their identity can be proven in the future. This information would be searchable by law enforcement by warrant only. Its ordinary use would be in the case of issuing a new ID card when the individual’s identity cannot be determined from their physical appearance.

Conclusion

We can implement a federal ID program that prevents most forms of voter fraud (except coercion), gives consumers greater protection from identity theft and does not reduce privacy.

Apple vs. The FBI

A court order can only do so much…

By now I’m sure most of you have heard about the debacle between the FBI and Apple.

Here’s a brief summary before I explain the ramifications:

The FBI has the San Bernardino shooter’s iPhone 5C but they can’t get into it because it’s locked behind a passcode. The passcode is used to key device-wide encryption so they can’t just remove the flash RAM and read it – what they read would be indecipherable random noise.

They also can’t brute force the passcode because Apple phones have a feature (which can be enabled by the user) that wipes the phone clean after a certain number of wrong guesses. That feature may be enabled on this phone so they can’t risk it.

What the FBI has asked Apple to do is create a purposely compromised version of their operating system that allows unlimited guesses, does nothing to delay each guess and allows guesses to be submitted over the Lightning jack and possibly through Bluetooth or WiFi. They then want Apple to sign this compromised OS image with Apple’s private software distribution key (so that the phone accepts it) and boot the phone with it so a law enforcement agent can brute force crack the passcode.

The legal issue: the FBI has a court order permitting them access to this specific iPhone and ordering Apple to provide it as described above. Apple has the technical ability to fulfill this request but is refusing to do so on principle. This is somewhat akin to the FBI having permission to attempt to crack a bank vault but Apple won’t let them in the door.

Why?

Tim Cook’s open message misrepresents a few of the technical issues but his legal argument is very sound: Apple will probably challenge this all the way to the Supreme Court but if they lose and comply with this order the legal precedent will be set and the government will have the (potential) ability to order this procedure done with other phones.

Granted, that’s a slippery slope argument, but I’ve got another one for you: this newly acquired ability will quickly become pointless as terrorists begin using their own custom software to provide end-to-end encryption of communications. This type of software can be written by a single developer with a smidgen of crypto competency and decent computer science savvy. The necessary encryption algorithms are freely available in open source libraries.

This loops us back to Tim Cook’s original point, but not in the way he was thinking: once Apple designs a backdooring process it will become increasingly useful only for nefarious people who want access to the phone’s normal features – the ones you and I use (banking, paying with Apple Pay, emails, SMS messages, Facebook, etc.) Any decently sophisticated criminal or terrorist organization would have nothing to fear after word spread and solutions were developed. If you’re the paranoid type, it would also enable the government to more easily spy on you.

There’s one more complication here: nothing is stopping Apple from building in hardware that makes it intractable for them to override the passcode behavior in the future. Indeed the only reason a software solution is somewhat viable now is because the 5C uses the older A6 chip. The A7 and up (used in all later iPhone models) already enforces an escalating delay after each wrong guess.

Bottom line: this is not something Apple can be reasonably asked to do going forward. Building in a backdoor from the factory is absolutely unacceptable for the obvious reasons. I really don’t know what law enforcement can do about this – good, well implemented encryption is impervious regardless of what resources are applied to cracking it, barring specific mathematical discoveries and dramatic increases in computing power.

Micro Expressions are Getting the Machine Learning Treatment

It Knows How You Feel

A few months I ago I started this site with an article on intent detection. In it I made the argument that protecting us from technology by banning it is intractable, so it’s probably more beneficial to focus on detecting humans who are about to carry out malicious acts.

One of the tools I proposed for achieving intent detection is the application of machine learning to detect micro expressions – those quick, involuntary displays of emotion we do but have little to no control over.

It is fascinating watching a field develop before your very eyes. Recently, a paper was published to arXiv on the application of machine learning to the detection of micro expressions. MIT Technology Review has also written a summary of the paper.

So, have these researchers solved the problem? Not yet, but as it goes with most computer science, their work sets a new bar for “state of the art.” To paraphrase, the advances they have made are:

1) They developed a new method to detect when a micro expression occurs during a video that does not require prior training. This is important because in order to recognize something, most computer vision systems need to be told where it is in the first place. The authors use the terms “spotting” and “recognizing” to distinguish between the two. Since their method does not require training, it makes the process of acquiring a large sample of micro expressions to work with easier. This is probably the most important achievement.

2) They utilized a new method to amplify the distinguishing features of a micro expression to make it more recognizable. This is important because micro expressions are generally very subtle, which makes them harder to classify.

3) They investigated multiple methods to mathematically represent the features needed to distinguish micro expressions, and analyzed their relative performances. This is important because the feature representation that works best on say, a color video does not work as well with near-infrared videos. Also (somewhat obviously), using a high speed camera makes it easier to recognize micro expressions.

4) They created an automatic recognition system that can detect and classify micro expressions from long videos with performance comparable to humans.

For those into machine learning, they performed the actual classification task with a linear Support Vector Machine. No fancy deep learning or neural networks, just good old large margin classification with customized feature descriptions.

It will be interesting to watch the field evolve over the next few years as researchers start applying the advancements of deep learning to the problem. We’re getting ever closer to viable intent detection.

Too Much Doom Crying, Not Enough Optimism

A few weeks ago I got into a protracted Facebook “conversation” about the dangers of beaming signals into space. It started as a simple discussion on the latest Kepler discovery, segued a bit into discussions of Dyson spheres and finally erupted into a full argument when someone said:

It’s likely that if there are aliens there, and they do notice us trying to communicate, that we’ll be on the receiving end of a resource hungry civilization who’s weapons and technology are comically far ahead of ours.

These discussions follow a typical format:

  1. Pointing out how big a 1500ly radius sphere is. Specifically, the inverse-square law and the difficulty of hitting something at the edge that we’re not even sure is there
  2. Pointing out how large of an interferometer (using the technique of aperture synthesis) a Dyson sphere capable species could build. In other words, they could detect us outright anyways, regardless of what we do
  3. Asking what a civilization that can build Dyson spheres and travel 1500ly in a meaningful timeframe would possibly want from the little rock we call Earth. They could literally harvest any other object in the galaxy
  4. And so on

This conversation stands out from the usual back-and-forth because the premise of their argument was (paraphrasing) “Stephen Hawking said so.” Indeed, Professor Hawking did saying something to that effect during his Discovery Channel show where he tried to draw a parallel between aliens and Christopher Columbus.

I believe it is quite a leap of logic to compare the tendencies of a highly advanced alien species with that of pre-industrial humanity. His point is moot however, because as I mentioned, they can find us anyways. Thus beaming signals into space at this point shouldn’t make a difference.

So did you just finish reading one man’s lamentations over another’s use of argument from authority? Yes.

I have a point though, which is to share this article by Brad Allenby entitled “Emerging technologies and the future of humanity.” In it he talks about how many prominent individuals who have done well at recognizing the near-term effects of emerging technology go too far with their long-term, dystopian conjectures. The mistake they make is analyzing future capabilities through the lense of the present, which is probably too myopic to see the whole picture clearly.

Facebook Combining Machine Learning Techniques

Making Applications Truly Intelligent

Question: what do you get when you combine a learner capable of classifying objects and actions in a photo with a learner capable of understanding naturally phrased questions?

Answer: a system capable of answering questions about the contents of a photo. In other words, Facebook’s new toy.

Layering multiple specialized learners into a single system is the next great frontier of machine learning. Why? Because learners can make great interfaces between other learners and the human beings attempting to derive insights from them. It is somewhat analogous to the capability that SQL gives database developers to quickly gather insights from millions of rows of tabular data.

For example, a computer vision learner that excels at determining whether a certain picture contains a cat or a dog has no idea that the entities it is differentiating between are called “cat” and “dog” until we assign those labels to its output. Even after we tell it that one neuron firing strongly means “dog” and another neuron firing means “cat”, the learner has no clue what those words mean in the linguistic sense. The features learned by the neural net that enable it to differentiate cats and dogs so well are radically different from the set of features needed to understand that “dog” and “cat” are nouns in the English language and should be utilized a certain way.

So couldn’t we make the neural net capable of learning the features needed to do both? Yes, but that would require expanding  it to a much greater size (which has computational resource costs) and would make its implementation more complicated. Right now, it’s easier (better) to stack a net capable of understanding natural language queries on top of a net capable of photo object identification. Facebook takes this one step further by giving their learners a contextual memory that allows them to understand basic cause and effect. Thus, a picture of a dog with a frisbee in its mouth allows the learner to answer “frisbee” when asked what game the dog is playing.

Some day, applications will be capable of self-determining what learners they need to apply to a particular problem, and in what order to apply them. Throw in a few orders of magnitude more processing power and storage and we’ll probably be very close to achieving artificial general intelligence.

Hundreds of Automatic License Plate Readers Found Wide Open by the EFF

Oops…

Hundreds of automatic license plate recognition (ALPR) systems were found wide open on the Internet by the Electronic Frontier Foundation (EFF). These systems potentially store years worth of archival data indicating which vehicles drove by a particular place at a particular time. Amazingly, this is not even that big a deal anymore, as it is not beyond the capability of a single decent developer now to create an ALPR system from scratch.

This is a good example of how one’s digital footprint grows over time by factors outside our direct control. It also demonstrates one component of the intractable nature of computer security – incompetence.

The Privacy Equation

Edward Snowden recently did an AMA (“Ask Me Anything”) on Reddit where he said:

Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.

A pithy statement, is it not? Unfortunately, the situation is not so simple – privacy and free speech are at odds with each other both technologically and legally. This is because the ability to preserve privacy and free speech are inversely related by the same fundamental processes. Simply stated, that which makes free speech more possible makes privacy less possible.

In this article I will show how the degradation of one’s privacy is inevitable and potentially accelerates over time by factors outside one’s direct control. This is a recent phenomenon brought upon by the digitization of information, always-on connectivity and continuous advancements in machine learning. These technologies and the infrastructures built from them also facilitate the propagation of uncensored free speech.

Thus one can accept the futility of preserving their privacy yet still cherish their freedom of expression. One day we will truly have very little to hide, regardless of whether we have something to say. Continue reading The Privacy Equation

The Intractability Problem

A recurring theme in sci-fi is the danger that new technology presents to mankind.

Perhaps the pinnacle of dystopic scenarios is the Singularity, that moment where artificial intelligence (AI) begins continuously self-improving to the point where we potentially lose control. This was the premise for the popular Terminator movies and others such as I, Robot and Transcendence, each featuring a race to shut the technology down before it grew out of control.

In this discussion, I will be making the argument that defending us from technology on a per-item basis is an intractable problem, thus the best solution requires focusing on the human beings who would erroneously or maliciously utilize technology to cause harm. I’m going to suggest a far more radical measure than simple psychological profiling or background checks. In order to appreciate its necessity, the intractability problem must be fully understood. Continue reading The Intractability Problem