So far, the industry’s popular view of the anti-sample is that it stems from the “quirks” of the model, and once the training algorithms and data collection have made enough progress, they end up Will disappear. Other common points of view include that the confrontation sample is either one of the results of the high dimensionality of the input space or the finite sample phenomenon (finite-samplephenomena).
Recently, several researchers from MIT have just completed a recent study that provides an anti-sample generation A new perspective on the reasons, and researchers with literary literacy try to tell the story through a subtle story.
Let’s listen to this little story about the sample.
A planet called Erm
The story begins with Erm, a distant planet inhabited by a group of ancient aliens known as Nets.
Nets is a magical species; each person's position in the social hierarchy depends on the strange 32 x 32 pixel image (meaningless to the Nets family) Ten completely arbitrary categories of capabilities.
These images come from a top-secret data set See-Far, in addition to watching these magical pixelated images, Nets lives It can be said that it is completely embarrassing.
Slowly, as Nets get older and smarter, they are beginning to find more and more signal patterns in See-Far. Each new pattern they find helps them classify data sets more accurately. Because of the enormous social value of improving classification accuracy, aliens have given the most predictive image patterns, such as the following picture:
TOOGIT, an image with a height indication of "1", Nets They are extremely sensitive to TOOGIT.
The most powerful aliens are very good at discovering these patterns,Therefore, these patterns are also sensitive to the appearance of See-Far images.
Somehow (perhaps looking for See-Far classification tips), some aliens have obtained human-written machine learning papers. In particular, one of the maps has attracted the attention of aliens:
A confrontation sample?
This picture is relatively simple. They think: "2" on the left, and then a GAB pattern in the middle, everyone knows that it means "4". So unexpectedly, adding a GAB to the image on the left produces a new image.This picture (for Nets) looks exactly the same as the image corresponding to the "4" category.
But Nets can't understand why the original and final images are completely different, but according to the paper they should be the same category. With doubts, Nets looked at the papers and wondered what other humans had forgotten....
We What Erm learned
As the name suggests in the story, this story is not just about aliens and their magical social structure: the way Nets develops It is to let us think of how the machine learning model is trained. In particular, we try to improve the accuracy, but we do not include background knowledge about the physical world or other human-related concepts of the classification. The result of the story is that aliens realize that humans believe that meaningless confrontational disturbances are actually a paradigm that is crucial for See-Far classification. therefore,The story of Nets should make us think:
Is the confrontational sample really unnatural and meaningless?
A simple experiment
To understand this, let’s start with a simple Experiment:
Start with an image from the training set of a standard dataset (eg CIFAR10):
Therefore, a new training set was created by changing the sample target like this:
Now,The resulting training set is only slightly disturbed by the original data set, and the label is changed - but for people its label is completely wrong. In fact, this mislabel is consistent with the "displacement" hypothesis (ie, each dog is marked as a cat, and each cat is labeled as a bird).
Next, we train a new classifier on this mislabeled data set (not necessarily the same structure as the first one). So what happens to the classifier on the original (unchanged) test set (ie the standard CIFAR-10 test set)?
Surprisingly, the results found that the new classifier had fairly good accuracy on the test set (44% on CIFAR)! Although the training input is only associated with its "true" tag by a slight perturbation, all visible features are associated with a different tag (which is now incorrect).
Why is this?
Our anti-sample conceptual model
In the experiment just described, we will standard The anti-disturbance action of the model obtains some generalization for the target class prediction mode. That is to say, only the antagonistic disturbances in the training set can make a moderately accurate prediction of the test set. With this in mind, one might think: Maybe these patterns are not fundamentally different from the patterns humans use to classify images (such as ears, beards, noses)! This is our hypothesis: many input features can be used to predict labels, but only a few of them are human-perceptible.
More precisely, we believe that the predictive characteristics of data can be divided into robust and unsteady features. Robust features correspond to patterns that predict real tags, even in some human-defined perturbations (such as l2ball). Conversely, non-robust features correspond to features that can be "flip" over a set of perturbations that are pre-set to predict the characteristics of the wrong category.
Because we always consider only disturbance sets that do not affect human classification, we hope that humans rely entirely on robust features to judge. However, when the goal is to maximize (standard) test set accuracy, non-robust features can be as useful as robust features—in fact, the two types of features are completely interchangeable. As shown below:
From this perspective, the above experiment is actually very simple. That is, in the original training set, both input robust features and non-robust features can be used to predict the label. When a subtle disturbance is made, the robust features (by definition) are not significantly affected, but it is still possible to flip the non-robust features.
For example, each dog's image retains the dog's robust character (and therefore, it is still a dog), but has a non-strong character of the cat. After re-marking the training set, the robust feature is actually pointed to the wrong direction (ie, the picture with the robust "dog" feature is marked as a cat), so only non-robust features provide the correct guidance in actual generalization.
In short, both robust and non-robust features can predict the training set, but only non-robust features will produce generalization of the original test set:
Therefore,The fact that the training model can be generalized to the standard test set on this data set indicates that (a) there are non-robust features and can achieve better generalization. (b) The depth of the neural network does depend on these non-stable Features, even if there are robust features that are equally available for prediction.
Can a robust model learn robust features?
Experiments have shown that antagonistic disturbances are not meaningless artifacts, but are directly related to disturbance characteristics that are critical to generalization. At the same time, our previous blog post about the anti-sample shows that by using robust optimization (robustop), a robust model that is less susceptible to the impact of the sample can be obtained.
Therefore, naturally ask: Can you verify that a robust model actually depends on robust features? To verify this, we propose a way to limit the input to model-sensitive features (for deep neural networks,Corresponds to the activation feature of the penultimate layer). Using this method, we created a new training set that contains only the features used by the trained robust model:
Afterwards, train a model on the acquired dataset without confrontation training . It was found that the obtained model has high accuracy and robustness! This is in stark contrast to the standard training set, which leads to accurate but fragile models.
Standard and robust accuracy, tested on the CIFAR-10 test set (DD). Training:
Left: CIFAR-10 normal training: CIFAR-10 for correct training: build data set normal training
The results show that Robustness (or non-robustness) can actually appear as an attribute of the data set itself. In particular, when we remove non-robust features from the original training set, we can obtain a robust model through standard (non-confrontation) training. The anti-sample is due to non-stable features and is not necessarily related to the standard training framework.
The immediate consequence of this new perspective change is the ability to combat sample migration (always mysterious phenomenon:The perturbation of one model is usually counter to other models) and no longer needs to be explained separately. Specifically, since confrontational vulnerability is seen as a direct result of generating features from data sets (rather than defects in individual model training), it is hoped that similar expression models can find and use these features to improve classification accuracy. Sex.
For further exploration, we examined how the tendency of different architectures to learn similar non-stable features is related to their resistance to sample transferability:
In the above figure, we generated the data set described in the first experiment (using the target class to mark the training set against the sample) and built the confrontation sample with ResNet-50.The resulting data set can then be considered to "flip" all non-robust features of ResNet-50 to the target class. Then, the five network models shown in the above figure are trained on this data set, and their generalizations are recorded on the real test set: that is, the model only uses the generalization of the ResNet-50 non-robust feature.
When analyzing the results, we see that the model is able to acquire the non-robust features introduced by ResNet-50, as demonstrated by the new perspective of the confrontation model. It is very relevant to the anti-migration between ResNet-50 and various standard models.
Our discussion and experimentation have determined that the confrontational sample is purely human-centered The phenomenon. Because from a classification performance perspective, there is no reason for the model to prefer robustness rather than non-robust features.
After all,The concept of robustness is directed at humans. Therefore, if we want the model to rely primarily on robust features, we need to make this clear by incorporating a priori into the architecture or training process. From this perspective, confrontational training (and, more generally, robust optimization) can be said to be the way to incorporate the required invariance into the learning model. For example, robust optimization can be seen as attempting to undermine their predictability by constantly "flip" non-robust features, thereby guiding the training model not to rely on them.
At the same time, when designing interpretable methods, it is also necessary to consider the dependence of the standard model on non-robust (non-intelligible) features. In particular, any "interpretation" of the prediction of a standard training model should highlight these non-stable features (not fully meaningful to humans) or hide them (not fully loyal to the model decision process). Therefore, if we want an interpretable method that is both human understandable and loyal to the model, then post-training alone is not enough, and intervention is required during training.