FGSM and BIM¶
Although adversarial attacks can be used in different domains, in these workshop we will apply them to images, generating adversarial images that have the potential to mislead image classifiers.
In this notebook we will be using the Adversarial Attacks FGSM (Fast Gradient Sign Method) introduced in the publication [GSS15].
This method is considered a White-Box Attack, meaning that we need to have access to the machine learning model to be able to perform the attack.
We will be working with the open source machine learning framework PyTorch for Python.
import os
from typing import List
import glob
import random
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision.transforms import *
from PIL import Image
np.random.seed(42)
# path to the data folder
data_folder = os.path.join('data')
Before we can use the model to make any predictions, we need to select the pytorch device that we are going to use. The current versions of PyTorch supports GPU acceleration. This support has a major impact of the time needed, not only for training deep learning models but also for using them to make predictions
Although the current version of PyTorch (1.9, as August 2021) supports CUDA (for Nvidia’s GPUs) and ROCm (for AMD’s GPUs), the latest is still in a beta state.
As the computers in the cloud GPU service(s) has CUDA enabled Nvidia GPUs, we will use cuda to accelerate the computation
# use compatible NVidia GPU if available
if torch.cuda.is_available():
device = torch.device('cuda')
print('cuda found!')
else:
device = torch.device('cpu')
print('no cuda found, will use cpu')
no cuda found, will use cpu
Loading the model¶
To be able to use this attack, first we need to select the machine learning model that we will use as an image classifier. For that purpose, we we will use the models available in the library torchvision, that is part of pytorch.
These models have been pre-trained in 1000 categories from the image dataset ImageNet.
We can find the models available on the torchvision website.
# we set the parameters pretrained and progress to True, to download the pretrained model with a progress bar
net = torchvision.models.alexnet(pretrained=True, progress=True)
# when we load a model with pytorch, by default it is in train mode
# as we are going to use the model to make predictions we set it with evaluation mode
# with the ; we hide the output that was going to be printed out
net.eval();
If a GPU is available we first need to pass the model to it. If it is not available the line will not change anything as it is already attached to the memomry of the CPU.
# we pass the model to the device just with this one line of code
net.to(device);
We also need to load the labels associated to the categories used to train the model. In this case the categories are the 1000 used in the competition ImageNet Large Scale Visual Recognition Challenge.
In the text file synset_words.txt you can see all the categories of images that the mode has been trained on. These categories, also known as synsets, were inheret from project Wordnet.
They use several keywords to describe each category. In the file synset_words.txt, I abbreviated these keywords. You can find the original list in this link: https://gist.github.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372
with open(os.path.join(data_folder, 'synset_words.txt'), 'r') as f:
synset_words = [' '.join(s.replace('\n', '').split(' ')[1:]) for s in f.readlines()]
np.random.choice(synset_words, 5)
array(['echidna, spiny anteater, anteater',
'bathtub, bathing tub, bath, tub',
'tobacco shop, tobacconist shop, tobacconist',
'white wolf, Arctic wolf, Canis lupus tundrarum', 'wombat'],
dtype='<U121')
Loading and visualizing images¶
We will load some images that we will use for our attacks with the library PIL.
images = {}
for file_name in glob.glob(os.path.join(data_folder, 'images', 'dataset', '*')):
# remove file extension and path
short_file_name = os.path.splitext(os.path.split(file_name)[-1])[0]
images[os.path.splitext(short_file_name)[0]] = {
# Image is from PIL library
'image': Image.open(file_name)
}
print(f"loaded {file_name}")
loaded data/images/dataset/dog.jpg
loaded data/images/dataset/harmonicawood.jpg
loaded data/images/dataset/pineapple.jpg
loaded data/images/dataset/joys.jpg
loaded data/images/dataset/jellyfish.jpg
loaded data/images/dataset/teapot.jpg
loaded data/images/dataset/pizza.jpg
loaded data/images/dataset/bus.jpg
loaded data/images/dataset/pig.jpg
loaded data/images/dataset/bear.jpg
Once we load the image, we are going to conduct a pre-processing step. This pre-processing is necessary as the network was trained with this pre-processing, so if we want to use the net we need to take those steps into account as well.
We will also write a reverse function plot_alexnet_image
which allows us to show us the image that was used in the network.
IMAGENET_MEAN = np.array([0.485, 0.456, 0.406])
IMAGENET_STD = np.array([0.229, 0.224, 0.225])
# classes are from torchvision.transform
preprocess_alexnet = Compose([
Resize(224),
CenterCrop(224),
ToTensor(),
Normalize(IMAGENET_MEAN, IMAGENET_STD)
])
def plot_alexnet_image(img_tensor: torch.tensor) -> None:
# alexnet has this strange permutation of
img_tensor = img_tensor.permute(1, 2, 0)
img_np = img_tensor.detach().cpu().numpy()
# _STD[None, None] adds 2 dimensions so we are multiplying the color dimension of the picture
img_np = (img_np * IMAGENET_STD[None,None]) + IMAGENET_MEAN[None,None]
img_np = np.clip(img_np, a_min=0.0, a_max=1.0)
# we plot the image
plt.figure()
plt.axis('off')
plt.imshow(img_np)
Now we will choose a random image to start with.
# select one random image image
random_file_name = np.random.choice(list(images.keys()))
random_image = images[random_file_name]['image']
original_img_tensor = preprocess_alexnet(random_image)
plot_alexnet_image(original_img_tensor)
print(f'Filename: {random_file_name}')
Filename: bus

Using the model to predict the content of the image¶
With the function predict_image_top_categories
we use the model to predict the content in the image.
This function returns the number of categories, defined by the parameter num_top_cat
, with a higher probability predicted by the model.
def predict_image_top_categories(
img_tensor: torch.tensor,
model: torchvision.models,
labels: List[str],
device: torch.device,
num_top_cat: int = 5
) -> List[List[str]]:
# create a mini-batch as expected by the model
# add an extra batch dimension since pytorch treats all images as batches
input_batch = img_tensor.unsqueeze(0)
# we send it to the device
input_batch = input_batch.to(device)
# forward pass, it returns unnormalized scores
output = model(input_batch)
# we use the Softmax function to get the probability distribution over categories
probabilities = torch.nn.functional.softmax(output[0], dim=0).cpu()
# show top categories per image
top_prob, top_catid = torch.topk(probabilities, num_top_cat)
return top_prob, top_catid
Now we can use this function to predict the content of our random image.
print(f'Prediction of image {random_file_name}')
confidences, cat_ids = predict_image_top_categories(original_img_tensor, net, synset_words, device, num_top_cat=5)
top_pred_id = cat_ids[0]
for conf, cat_id in zip(confidences, cat_ids):
print(f'Confidence {conf:.2%}\t{cat_id}\t{synset_words[cat_id]}')
Prediction of image bus
Confidence 40.86% 874 trolleybus, trolley coach, trackless trolley
Confidence 20.97% 867 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi
Confidence 15.22% 654 minibus
Confidence 7.89% 705 passenger car, coach, carriage
Confidence 7.77% 555 fire engine, fire truck
The FGSM attack¶
Now it’s time to define the function that will perform the FGSM adversarial attack. This function takes as an input the image (in a PyTorch tensor format), the model, the ID of the true category corresponding to the image, the device and a factor number (eps) that will determine the strength of the adversarial noise applied to the image.
Although by default this value was set to \(0.007\), a higher value will often increase the chances to performa successful attack, but it till also make the adversarial noise more noticeable in the generated adversarial image.
def fgsm(
img_tensor: torch.Tensor,
model: nn.Module,
image_pred_label_idx : int,
device: torch.device,
eps=0.007, # ".007 corresponds to the magnitude of the smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers."
) -> List[torch.Tensor]:
adv_noise = torch.zeros_like(img_tensor)
img_tensor.requires_grad_() # gradient required
# create a mini-batch as expected by the model and send it to device
input_batch = img_tensor.unsqueeze(0).to(device)
model.zero_grad() # reset gradients
x = model(input_batch) # forward pass
# define the loss function
loss = nn.CrossEntropyLoss()
# we create the label tensor and send it to device
label = torch.tensor([image_pred_label_idx], dtype=torch.long).to(device)
# calculate the loss
loss_cal = loss(x, label)
# perform a backward pass in order to get gradients
loss_cal.backward()
# sign of data gradient of the loss func (with respect to input x)
# as described in the paper.
data_grad_sign = img_tensor.grad.sign()
# for generating the adversarial image, we add the sign from the gradient multiplied by the epsilon
adv_noise = eps * data_grad_sign
adv_noise_full = data_grad_sign
# and we merge it with the orginal image
adv_img_tensor = img_tensor + adv_noise
return adv_img_tensor, adv_noise, adv_noise_full
Now that we have defined the function, we can call it to conduct an adversarial attack with the loaded image. The function will return the adversarial image, and two images of the adversarial noise generated. The adversarial noise multiplied by the eps factor, and the adversarial noise in its full extent.
Now its time to perform the attack.
# adv_tensor_img, adv_tensor_noise, _ = fgsm(original_img_tensor, net, pred_idx, device, 0.13)
adv_tensor_img, adv_tensor_noise, _ = fgsm(original_img_tensor, net, top_pred_id, device, 0.05)
print("adv image")
plot_alexnet_image(adv_tensor_img)
plot_alexnet_image(adv_tensor_noise+0.0)
adv image


Now we can pass the adversarial image that we just generated to the model and see if it succeeds to mislead the model
confidences, cat_ids = predict_image_top_categories(adv_tensor_img, net, synset_words, device, num_top_cat=5)
for conf, cat_id in zip(confidences, cat_ids):
print(f'Confidence {conf:.2%}\t{cat_id}\t{synset_words[cat_id]}')
Confidence 86.89% 867 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi
Confidence 3.26% 864 tow truck, tow car, wrecker
Confidence 2.39% 595 harvester, reaper
Confidence 2.15% 569 garbage truck, dustcart
Confidence 1.09% 555 fire engine, fire truck
References¶
Bibliography¶
- GSS15
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. 2015. arXiv:1412.6572.