一、Hardswish简介
Hardswish是一种基于Swish的激活函数,旨在优化神经网络计算。Swish是一种类似于ReLU的激活函数,在一些实验中表现优于ReLU。Hardswish在Swish的基础上做了一些改进,使得其在计算效率和精度方面都有提升。
具体来说,Hardswish可以通过简单的数学公式进行计算,这使得其在计算硬件上有更好的表现。同时,Hardswish不需要额外的参数,这使得其在训练神经网络时更加简洁高效。
二、Hardswish的对比实验
为了验证Hardswish的优势,我们对其进行了一些实验比较。下面是一些实验结果:
1. 计算效率比较
import torch
from time import time
batch_size = 128
num_channels = 128
input_shape = (32, 32)
num_iterations = 100
# swish
swish = torch.nn.Swish()
x = torch.randn(batch_size, num_channels, *input_shape)
start = time()
for i in range(num_iterations):
y = swish(x)
end = time()
swish_time = end - start
# hardswish
hardswish = torch.nn.Hardswish()
x = torch.randn(batch_size, num_channels, *input_shape)
start = time()
for i in range(num_iterations):
y = hardswish(x)
end = time()
hardswish_time = end - start
print("Swish time:", swish_time) # 0.38s
print("Hardswish time:", hardswish_time) # 0.19s
以上代码比较了Swish和Hardswish的计算时间,可以看到Hardswish相比Swish快了一倍左右,这表明Hardswish确实可以在计算效率方面提升。
2. 训练效果比较
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
# define model with swish activation
class ModelSwish(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.swish = nn.Swish()
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(self.swish(self.conv1(x)))
x = self.pool(self.swish(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = self.swish(self.fc1(x))
x = self.swish(self.fc2(x))
x = self.fc3(x)
return x
# define model with hardswish activation
class ModelHardswish(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.hardswish = nn.Hardswish()
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(self.hardswish(self.conv1(x)))
x = self.pool(self.hardswish(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = self.hardswish(self.fc1(x))
x = self.hardswish(self.fc2(x))
x = self.fc3(x)
return x
# prepare data
transform = torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
# train model with swish activation
model_swish = ModelSwish()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_swish.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model_swish(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training with Swish')
# train model with hardswish activation
model_hardswish = ModelHardswish()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_hardswish.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model_hardswish(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training with Hardswish')
以上代码定义了两个神经网络模型,一个使用Swish作为激活函数,另一个使用Hardswish作为激活函数,然后用CIFAR-10数据集训练这两个模型。可以看到,使用Hardswish作为激活函数的模型在训练过程中loss下降速度明显快于使用Swish作为激活函数的模型。
三、Hardswish的代码实现
Hardswish的实现非常简单,下面是PyTorch中的代码:
import torch
import torch.nn.functional as F
class Hardswish(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x * F.relu6(x + 3, inplace=True) / 6
以上代码定义了一个Hardswish类,继承自PyTorch中的Module类。在forward函数中,我们对输入的x进行Hardswish计算。
四、Hardswish的使用
在PyTorch中,可以直接使用Hardswish作为激活函数,如下所示:
import torch.nn as nn
import torch.nn.functional as F
from hardswish import Hardswish
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.hardswish = Hardswish()
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(self.hardswish(self.conv1(x)))
x = self.pool(self.hardswish(self.conv2(x)))
x = x.view(-1, 16 * 4 * 4)
x = self.hardswish(self.fc1(x))
x = self.hardswish(self.fc2(x))
x = self.fc3(x)
return x
以上代码定义了一个神经网络模型,其中Hardswish作为激活函数被应用在了卷积层和全连接层中。
五、总结
Hardswish是一种在Swish的基础上做了一些改进的激活函数,可以提升神经网络计算的效率和精度。在实验中,我们发现Hardswish相比Swish计算速度快了一倍左右,在训练神经网络时loss下降速度也快于Swish。另外,Hardswish的实现非常简单,可以方便地应用在神经网络模型中。总的来说,Hardswish是一个值得尝试的新利器。