During the past few years, deep learning has revolutionized nearly every field it has been applied to, resulting in the greatest leap in performance in the history of computer science.
With many problems, for which we were used to seeing small, gradual improvements every year, we are now witnessing 20% – 30% improvements within months, due to the application of deep learning.
This success has also stirred lots of media and PR buzz, as a result of which, nowadays the terms “artificial intelligence”, “machine learning”, and “deep learning” are used very widely, and most often inaccurately and confusingly.
It’s important that we attempt to clarify, and demystify, the distinction between these technical terms, and then focus on their application to the field of cybersecurity.
Artificial Intelligence (AI), a phrase coined by the pioneering computer scientist John McCarthy in the 1950s, is an umbrella term for all the methods and disciplines that result in any form of intelligence exhibited by machines.
This includes anything from the 1980s expert systems (basically datasets of hard-coded knowledge), up to most advanced forms of AI in the 2010s.
Nearly all software in all industries today uses at least some form of AI, even if it is limited to some basic manually coded procedures.
Machine Learning is currently the leading sub-field within AI. It allows computers to learn without being explicitly programmed.
Machine learning-based methods completely dominated AI in the 2000s, and have outperformed all non-machine, learning-based results.
Despite its success, one of the major limitations of traditional machine learning is its reliance on feature extraction, a process through which human experts dictate what the important features (i.e., properties) of each problem are.
For example, when applying machine learning to face recognition, the raw pixels in the image cannot be fed into the machine learning module, but instead they must first be converted into features such as distance between pupils, proportions of the face, texture, colour, etc.
This feature extraction phase basically results in most of the raw data being ignored, and the selected features, as good as they may be, miss the rich nonlinearities in the data.
Deep Learning, aka “deep neural networks”, is a sub-field of machine learning, and takes inspiration from how our brains work.
The big conceptual difference between deep learning and traditional machine learning is that deep learning is the first, and currently the only learning method that is capable of training directly on the raw data (e.g., the pixels in our face recognition example), without any need for feature extraction.
Additionally, deep learning scales well to hundreds of millions of training samples, and continuously improves as the training dataset becomes larger and larger.
Over the past few years, deep learning has reached a 20-30% improvement in most benchmarks of computer vision, speech recognition, and text understanding – the greatest leap in performance in the history of AI and computer science.
Two major drivers contributed to the sudden mind-boggling success of deep learning.
First was the improvement in algorithms. Until a few years ago, we could train shallow neural networks, and deeper networks could not converge due to algorithmic limitations.
Improved training methods today allow for the successful training of very deep neural networks, with many tens of layers and billions of synapses (connectors between the neurons).
The second, and more important factor, is the use of graphical processing units (GPUs).
Nowadays, all deep learning training is conducted on Nvidia’s GPUs, which results in speeds 100 times greater than the alternative hardware (for comparison, a deep-learning brain that would take more than three months to train on CPUs, can be trained in about a single day using GPUs).
Despite the success of deep learning in many tasks, the barrier to entry into deep learning remains high, mainly due to the scarcity of deep-learning researchers and scientists, who are critical for its successful application.
AI and cybersecurity
With more than one million new malware programmes created every day, and the continuously increasing sophistication of these malware, the task of detection remains very difficult.
Traditional signature-based solutions can only detect currently known malware, and any well-known malware can be easily mutated allowing it to evade detection by them. To cope with this increasing difficulty, many cybersecurity solutions today use some form of AI.
Heuristics-based approaches offer limited improvement in this field; hence, the most advanced cybersecurity solutions resort to machine learning.
By using machine learning it is possible to train large datasets of files that can automatically learn to separate between malicious and legitimate files in a way that no manual or semi-manual method would be capable of.
Despite the substantial improvements due to machine learning in cybersecurity, sophisticated malware still manages to evade detection with relative ease.
When applying traditional machine learning, it is necessary to first convert the computer files from raw bytes to a list of features (e.g., important API calls, etc), and only then is this list of features fed into the machine learning module.
Even if hundreds or thousands of features are considered by human experts, they still represent only a fraction of the raw data in the files; as a result of this the files can easily be mutated to evade detection by these linear high-level features.
Similar to the application of deep learning in other domains, its application to cybersecurity allows for direct processing of the contents of the files, without any feature extraction.
That is, the input to the deep-learning brain is the raw byte values, regardless of file format, file size, or even the operating system.
Additionally, unlike traditional machine learning, which reaches a performance ceiling as the number of files it is trained on increases, deep learning can effectively improve as the datasets grow, to the extent of hundreds of millions of malicious and legitimate files.
The results of benchmarks that compare the performance of deep learning vs traditional machine learning in cybersecurity show that deep learning results in a considerably higher detection rate and a lower false positive rate.
These results are consistent with the large performance boost obtained by the application of deep learning in other domains.
As malware developers use more advanced methods to create new malware, the gap between the detection rates of deep learning vs traditional machine learning will grow wider; and in coming years it will be critical to rely on deep learning in order to have a realistic chance of foiling the most sophisticated attacks.