August 27, 2018

Machine learning: good for security or a new threat?

Machine learning is no novelty anymore. On the contrary: every self-respecting startup feels compelled to apply machine learning in its offerings. The hunt for scarce developers has been superseded by a scramble for machine learning experts. Fortunately, many machine learning tasks are similar enough that it is possible to save time and money by using pre-trained models. Open-source models are also available free of charge. But does this all really work as well as it seems?

Machine learning methods mean methods of creating algorithms that can learn and act without being explicitly programmed using prearranged data. Data refers to anything that can be either described using any features or measured. If we need a feature that is unknown for part of the data, we apply machine learning methods to predict the values of this feature based on already known data.

The figure below illustrates how any object is described by some features X, which can be measured, calculated, or discovered. There is also a target feature y, which can be unknown for some of the data. Using the data for which the target feature is known, we can train a model to predict the target feature for the remainder of the data.


Machine learning is used for solving several types of tasks, but this article will mostly consider the topic of classification.

The aim of the classifier model training stage is to find a correlation (function) that maps features of a specific object to one of the known classes. Cases that are more complicated require prediction of the class probability.

In essence, we have a set of input values 
--> X = {x1, … , xn}, a set of potential classes  --> Y = {y1, … , ym}    , and a loss function l. Our task is to find a function f: X → Y based on available data D, so that f minimizes loss function l. A quadratic loss function is the type most commonly used.
Function space F can be any mapping of functions that relate X → Y.


Thus, the task of classification is to create a hyperplane that would divide the space, the size of which generally equals the size of feature vector, in parts so that the objects of each class lie on different sides of this hyperplane.

The hyperplane for two-dimensional space is a line. Let us review a simple example:

The figure shows two classes: squares and triangles. It is impossible to identify the relation and accurately divide them with a linear function. Machine learning can approximate the non-linear function that would divide these two sets in the best way.

Classification is a task for supervised learning. Learning requires a set of data with distinguishable object features and classes.

Developers of such systems often face a critical question: who should tag these object classes? In some cases, historical data is available or object features can be measured, and sometimes there is an expert who can provide this information. But is this information always correct and objective?

Information security has been applying machine learning methods for quite some time now, in areas such as spam filtering, traffic analysis, and fraud and malware detection. It is a bit of a cat-and-mouse game in which one makes their move and waits for the opponent's response. And while playing this "game," you have to continuously train models using new data or replace them completely because of the latest breakthroughs.

An illustration of this case is antivirus software, which makes use of signature analysis, heuristics, and manually created rules. Maintaining all this is rather time-consuming! Information security experts debate the usefulness of antivirus solutions; many consider it a dead product category. All these rules applied in antivirus products can be bypassed, for example with obfuscation and polymorphism. Therefore, we would likely prefer tools that use smarter techniques such as machine learning for automatic identification of features (even those uninterpretable by a human), quick processing and generalizing of large quantities of data, and fast decision-making.

So as we see, on the one hand, machine learning can be used for protection. On the other hand, it also makes attacks smarter and more dangerous.

Let's check if this tool is vulnerable


Any algorithm requires not only carefully selected hyperparameters, but also training data. Ideally, training data should be sufficient, with balanced classes and a brief training period—which is nearly impossible in real life.

By the quality of a trained model, we usually refer to accuracy in classifying data that the model "sees" for the first time. Broadly speaking, quality represents the ratio of correctly classified data samples to the total amount of data provided to the model.

All quality assessments make implicit assumptions about the expected distribution of input data and do not take into account adversarial settings, which frequently go beyond the expected distribution of input data. Adversarial settings mean an environment in which it is possible to confront or interact with the system. Typical examples of such settings include environments that use spam filters, fraud detection algorithms, and malware analysis systems.

Thus accuracy can be seen as an average value of system performance in typical cases, while security assessment considers the worst performance cases.

Machine learning models are commonly tested in a more or less static environment, in which accuracy depends on the quantity of data for each specific class, but we cannot be sure that such a distribution will exist in reality. However, we want the model to make mistakes. Therefore, our task is to find as many vectors giving a misleading result as possible.

When we speak of the security of a system or service, we generally mean that it is impossible to breach a hardware or software security policy within the framework of our threat model, as verified during the development and test stages.

Unfortunately, a large number of services currently use data analysis algorithms, therefore, a risk can come not from vulnerable functionality, but from the data used by a system to make decisions.

Change is all around us, and hackers too are constantly learning something new. To protect machine learning algorithms from attackers, who may abuse their knowledge of how a model operates to compromise the system, adversarial machine learning methods are used.

This concept of information security in machine learning gives rise to a number of questions, some of which we will discuss here.

Is it possible to manipulate a machine learning model to perform a targeted attack?


Here is a simple example with search engine optimization (SEO). People already study the way the smart algorithms of search engines work and manipulate websites to get a higher ranking in search results. Security of such systems is not a critical issue, as long as no data is compromised or significant damage is caused.

It is possible to attack services that are based on online learning: to train the model, data is provided in consecutive order to update current parameters. With knowledge of the system's learning process, an attacker can change the result by supplying suitably arranged data to the system.

Biometric systems, for example, can be fooled in this way. Their parameters are gradually updated based on slight changes in appearance, such as aging, which is absolutely natural and essential to take into account. But an impostor can benefit by feeding certain data to the biometric system that subtly influences the learning process until, eventually, the model learns to accept the impostor's appearance.

Can an impostor select valid data so that the data would always trigger a malfunction, degrading system performance to the point that the system must be disabled?


This issue is quite natural because machine learning models are tested in a static environment, and their quality is assessed based on the distribution of the data that has been used for learning. Nevertheless, data analysis experts face the following questions, which their models have to be able to answer:

  • Is the file malicious?
  • Is the transaction fraudulent?
  • Is the traffic legitimate?

Of course, an algorithm cannot be 100 percent accurate; it can only classify an object with some probability. Therefore, in case of type I and type II errors—when our algorithm cannot be completely sure of its choice and makes mistakes—a compromise has to be found.

Let's review a sample system with very frequent type I and type II errors. An antivirus product has blocked your file, falsely considering it to be malicious, or has failed to protect you from a malicious file. In this case, a user considers the product to be useless and simply disables it, although the error may be due to the dataset.

And the thing is that there always exists a dataset that will yield the worst results for a given model. So all an attacker needs to do is find such data in order to make the user disable the service. Such situations are rather troublesome and should be avoided by the model. Imagine the work involved in investigating all false incidents!

Type I errors are considered a waste of time, while type II errors are a missed opportunity. But in fact, the cost of these two types of errors may be different for each system. For antivirus software, type I errors may be less costly: it is better to be overcautious and err on the side of calling a file malicious. After all, if the user has disabled the software and the file actually was malicious, the antivirus product still "did its job" and the responsibility lies with the user. If we are talking about a system for medical diagnostics, both mistakes are rather expensive: in either case, the patient is at risk of incorrect treatment and risk to health.

Can an attacker who wants to disrupt a system take advantage of the properties of a machine learning method, without interfering with the training process? In other words, could an attacker identify limitations in the model that invariably produce false predictions?


The process of assigning features in deep learning systems seems to be basically safe from human interference, so in this sense decision-making by the model is safe from the human factor. The great thing about deep learning is that you only need to feed raw input data to the model; by multiple linear transformations, the model itself extracts the features it considers the most important and makes a decision. But what are the limitations of this approach?

Research papers have described adversarial examples, which are improperly classified by the system, in deep learning. One of the most well-known articles is "Robust Physical-World Attacks on Deep Learning Models."

Based on the restrictions of deep learning, the authors suggested a number of techniques for bypassing models that can deceive vision systems. As an example, they performed experiments with traffic sign recognition. To fool the system, it would be sufficient to identify the object areas that, when modified, confuse the classifier into making a mistake. The experiment was to modify a STOP sign so that it would be classified as SPEED LIMIT 45 by the model. The researchers also tested their approach on other traffic signs, with similarly successful results.



In general, the article explains two ways of fooling a machine learning system: poster-printing attacks, which involve a number of small perturbations (camouflage) on the sign, and sticker attacks, with placement of stickers in specific areas.

These situations can easily occur in real life: a traffic sign is covered with dust or has undergone an artistic intervention. So it might seem that artificial intelligence and art are fated to exist apart.


Targeted attacks against automatic speech recognition systems have also lately become fodder for research. Voice messages are "cool" on social networks, but not always convenient to listen to. Hence the creation of speech-to-text services. The researchers analyzed original audio and its waveform and created a different audio waveform, which was 99 percent similar to the original one with minor changes added. The resulting transcription yields the text selected by the attacker.
The figure below gives an attack illustration: a waveform is slightly modified, causing the transcription to consist of a phrase chosen by the attacker.


What methods are there to prevent manipulation of machine learning models?


Currently it is easier to attack a machine learning model than to protect it from adversarial attacks. The reason is that no matter how long we train the model, there always exists a dataset that will be misclassified by the model.

Nobody has yet invented any ways to guarantee perfect accuracy by a model. However, there are several ways to make a model more robust to adversarial examples.

Our main tip is: do not use machine learning models in adversarial settings if possible. You're in the clear to use machine learning if your task is to classify pictures or generate memes. Even if a deliberate attack is successful, the societal or economic consequences are minimal. However, if your system performs important functions—say, diagnosing diseases, detecting attacks against industrial facilities, or controlling a self-driving car—the risks of compromise may be disastrous.

Recalling our simplified description of what classification is—creating a hyperplane that would divide space into classes—we can observe a contradiction. Let's review this situation in two-dimensional space.

On the one hand, we are trying to find the function that would divide two classes into different groups with maximum accuracy. On the other hand, we cannot form an accurate line because we generally do not have the entire population. Our task is to find the function that would minimize classification mistakes. To summarize, we want to form an accurate line, while avoid overfitting (hewing too closely to the known data) so that the model can still predict the behavior of unknown data.


1—underfitting; 2—overfitting; 3—optimal

The way to avoid underfitting is clear: by increasing the dataset for training by any means possible. Overfitting also can be combated with effective regularization methods. These methods make a model more robust to small outliers, but not to adversarial examples.

Incorrect classification of adversarial examples is an obvious problem. If a model has not seen such examples among its training data, it will probably make errors. This issue can be solved by adding adversarial examples to the training dataset, at least to avoid those particular errors. Still, it seems improbable that we can generate all possible adversarial examples and have 100 percent accuracy, because of needing to find a compromise between overfitting and underfitting.

One more tool is a generative adversarial network (GAN), which consists of two neural networks—generative and discriminative. The discriminative model aims to distinguish between fake and real data, and the generative model learns to generate data that can fool the discriminative model. A compromise between sufficient classification quality of the discriminator and the time spent on learning can produce a model that is robust to adversarial examples.

But despite these methods, it is still possible to create a dataset that will lead the model to a wrong solution.

What are the potential implications of machine learning for information security?


Debates about who should bear responsibility for errors made by machine learning models, as well as their social consequences, have gone on for a long time. Creation and use of such systems involves several stakeholders, including algorithm developers, data providers, and system users (that is to say, the owners).

At first glance, the developer would seem to have a great impact on the result—from selecting an algorithm to setting parameters and performing testing. But in reality, the developer makes a software product that is supposed to meet certain requirements. As soon as the model complies with these requirements, the developer's work is done and the model moves into the operational stage, probably revealing some bugs in the process.

On the one hand, this happens because developers cannot know the whole population of data at the training stage. But on the other hand, this can be an artifact of real-life data. A very vivid example is the Twitter chatbot created by Microsoft that learned from real data and then started to write racist tweets.

Was such behavior a bug or a feature? The algorithm used real data for learning and started to imitate it. That might seem to be a marvelous achievement by the developers, in a technical sense. But the data was what it was, so from an ethical point of view, this bot turned out to be unusable—because it learned so well to do what everyone wanted it to do.

Perhaps Elon Musk was right after all to claim that "artificial intelligence is our biggest existential threat"?

No comments:

Post a Comment