What’s So Deep (and Powerful) About Deep Learning?
In Part I of our series on artificial intelligence (AI) and machine learning (ML), we covered some of the basics: what the terms mean, how the field evolved, and what these technologies can and can’t do. We ended with a promise to tell you more about one of today’s most exciting (and widely hyped) types of machine learning: deep learning.
As we mentioned last time, deep learning systems organize large numbers of artificial “neurons” into layers that slightly resemble the way human brains learn. (There are important differences between these “artificial” neural networks and the one between your ears. But there are also some non-trivial similarities.)
Neural networks have been around a long time. But they’ve had serious limitations – and as with many things AI-related, their failures led to a backlash, in which many researchers and investors turned away from them.
But, several years ago, some big things changed.
Even simple neural networks tend to use a great deal of computing power, and adding layers dramatically increases the load. So most neural networks were very “shallow” – and that limited their ability to learn. But the power of computers soared, and the cost of that power plummeted. One big reason: video gamers needed more powerful graphics hardware. Specialized “graphics processing units (GPUs)” were invented – and they turned out to be brilliantly suited for neural networks.
Deep neural networks also need plenty of data to learn from – and as you keep giving them more data, they often keep improving, which hadn’t always been the case with previous methods. Nowadays, there’s tons of data. And give the researchers some well-deserved credit: they came up with significantly smarter ways to organize and build these networks.
When all this came together, the results were stunning. An early image recognition system categorized YouTube images 70% better than its predecessors. An early voice recognition system quickly cut errors by 25%. Then, in 2016, DeepMind’s AlphaGo system used deep learning to defeat the world’s Go champion. AlphaGo hadn’t “merely” mastered a game with an infinite number of possible moves. It was making creative moves never seen by Go experts before.
Mathematicians are still working to understand exactly how, when, and why deep networks are better than shallow networks. (As with all things AI, much depends on the application.) But the extra interconnected layers mean that sets of layers can be assigned to master individual aspects of a problem. So, for example, one set might be tasked to find edges within an image. The output from each set gets fed down to the next layers, which handle more complex tasks – such as finding a corner, or eyes, or a face. This wasn’t practical before, and it makes a big difference.
The system’s ultimate output might be a prediction about how likely it is that the picture contains a face. If the system’s trainers already know which pictures contain a face, they can send feedback into the system so it adjusts and gradually becomes more accurate.
Deep learning neural networks aren’t just good for recognizing cats and playing board games. At Sophos, we use them to recognize malware nobody’s ever seen before. That’s especially important nowadays, when a great deal of malware shows up only in one organization or on just a few computers. We need to identify it not just by its code, but by how it behaves.
As it turns out, that’s a near-perfect challenge for deep learning. Deep learning neural networks thrive on huge amounts of data, and Sophos analyzes over 2.8 million new malware samples every week. When we combine our new data with the hundreds of millions of samples we’ve already collected, our deep learning system can essentially ‘memorize’ the entire observable threat landscape as it stands right now. Our models help us analyze complex relationships between different features, and we can continually adjust them to target real malware with fewer false positives.
Where’s the field of deep learning headed next? Two innovations seem especially exciting.
Generative adversarial networks (GANs) involve pitting deep learning systems against each other, thereby forcing them to improve. One system might act as “generator” – perhaps to conjure up realistic-looking images of human faces from scratch. The second system acts as “discriminator,” trying to tell the fakes from other images of actual humans. With System #2 delivering feedback at lightning speed, System #1 improves to deliver truly freaky performance.
Deep reinforcement learning (DRL) systems create environments in which an automated agent can gain recognizable rewards by pursuing a goal (such as driving your car home safely for you). Using deep reinforcement learning, AlphaZero (a descendant of AlphaGo) learned multiple two-player games with no previous knowledge except the official rules – and within 24 hours, it was playing them at superhuman level. Some people think DRLs might be a step towards the holy grail of artificial intelligence: systems that can learn anything, not just one or a few things.
How far can deep learning truly go, and how widely will it be used? Time will tell. Some experts are skeptical, and so far, every previous approach to artificial intelligence has ultimately fallen short of the hype. Still, there’s no denying what deep learning has already done – and how it’s already helping Sophos keep you safe.