快速突破PyTorch，构建CNN图像分类模型！！

整个案例是使用 CNN 构建图像分类，包括理论背景、模型构建、训练过程、代码实现及可视化的表现。

好了，废话不多说，一起来看~

理论背景

卷积神经网络（CNN）概述

卷积神经网络（Convolutional Neural Networks, CNNs）是深度学习中的一种主要架构，特别擅长处理网格数据，例如图像。CNN利用卷积操作，可以自动和自适应地从数据中提取特征，避免了手工特征工程的麻烦。

卷积层（Convolutional Layer）：

卷积层是CNN的核心，通过对输入应用多个卷积核（filter）进行卷积操作，生成特征图（feature map）。公式如下：

其中，是输入数据，是第个卷积核。

池化层（Pooling Layer）：

池化层用于减少特征图的尺寸，降低模型复杂度并防止过拟合。常用的池化方法包括最大池化（Max Pooling）和平均池化（Average Pooling）。最大池化公式如下：

激活函数（Activation Function）：

常用的激活函数是ReLU（Rectified Linear Unit），它通过将负值设为0来引入非线性：

全连接层（Fully Connected Layer）：

全连接层将卷积和池化层提取的特征映射到类别标签空间，通常用在网络的最后几层。公式如下：

其中，和是权重和偏置，是激活函数。

实际项目

数据集

我们将使用 CIFAR-10 数据集，它包含 10 个类别的 60,000 张彩色图像，每个类别有 6,000 张图像。

将下载的 cifar-10-python.tar.gz 文件解压到你的项目目录中的 ./data/cifar-10-batches-py/ 目录。

mkdir -p ./data/cifar-10-batches-py

tar -xzvf cifar-10-python.tar.gz -C ./data/cifar-10-batches-py

大家注意，PyTorch 会自动检测本地文件并使用数据，不会从网上下载。

import torch

import torchvision

import torchvision.transforms as transforms



# 数据预处理和加载

transform = transforms.Compose([

    transforms.RandomHorizontalFlip(),

    transforms.RandomCrop(32, padding=4),

    transforms.ToTensor(),

    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))

])



train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=False, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=100, shuffle=True)



test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=False, transform=transform)

test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100, shuffle=False)

模型构建

构建一个具有两个卷积层、两个池化层、一个全连接层的CNN 模型：

import torch.nn as nn

import torch.optim as optim



class CNN(nn.Module):

    def __init__(self):

        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        self.fc1 = nn.Linear(64 * 8 * 8, 512)

        self.fc2 = nn.Linear(512, 10)



    def forward(self, x):

        x = torch.relu(self.conv1(x))

        x = self.pool(x)

        x = torch.relu(self.conv2(x))

        x = self.pool(x)

        x = x.view(-1, 64 * 8 * 8)

        x = torch.relu(self.fc1(x))

        x = self.fc2(x)

        return x



model = CNN()

损失函数和优化器

使用交叉熵损失（Cross-Entropy Loss）和 Adam 优化器（Adam Optimizer）：

import torch.optim as optim



criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

训练模型

训练模型时，我们会记录损失和准确率：

# 训练模型

num_epochs = 10

for epoch in range(num_epochs):

    running_loss = 0.0

    for images, labels in train_loader:

        optimizer.zero_grad()

        outputs = model(images)

        loss = criterion(outputs, labels)

        loss.backward()

        optimizer.step()

        running_loss += loss.item()



    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader)}")



    # 在测试集上评估模型

    model.eval()

    correct = 0

    total = 0

    with torch.no_grad():

        for images, labels in test_loader:

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

    print(f"Accuracy of the model on the test images: {100 * correct / total}%")

    model.train()

绘制损失和准确率曲线

我们可以使用 Matplotlib 绘制训练损失和准确率随时间的变化曲线。

import matplotlib.pyplot as plt



# 假设 train_losses 和 test_accuracies 存储了每个 epoch 的损失和准确率
train_losses = [0.5, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.02]
test_accuracies = [60, 65, 70, 75, 80, 85, 88, 90, 92, 95]

# 绘制损失曲线
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(range(num_epochs), train_losses, label='Training Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss over Epochs')
plt.legend()

# 绘制准确率曲线
plt.subplot(1, 2, 2)
plt.plot(range(num_epochs), test_accuracies, label='Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.title('Test Accuracy over Epochs')
plt.legend()

plt.tight_layout()
plt.show()

可视化卷积核和特征图

可视化卷积核和特征图有助于理解CNN如何提取和利用特征。

import numpy as np



# 可视化第一个卷积层的卷积核

def visualize_filters(layer):

    filters = layer.weight.data.clone()

    filters = filters - filters.min()

    filters = filters / filters.max()

    filter_img = torchvision.utils.make_grid(filters, nrow=8, padding=1)

    return filter_img.permute(1, 2, 0).cpu().numpy()



conv1_filters = visualize_filters(model.conv1)

plt.imshow(conv1_filters)

plt.title('Conv1 Filters')

plt.show()

# 可视化卷积层的特征图

def visualize_feature_maps(images, model):

    with torch.no_grad():

        x = torch.relu(model.conv1(images))

        feature_maps = x[0, :8]  # 可视化前8个特征图

        feature_maps = torchvision.utils.make_grid(feature_maps.unsqueeze(1), nrow=8, padding=1)

        return feature_maps.permute(1, 2, 0).cpu().numpy()



sample_image, _ = next(iter(train_loader))

feature_maps = visualize_feature_maps(sample_image, model)

plt.imshow(feature_maps)

plt.title('Feature Maps after Conv1')

plt.show()

最后

整个项目通过PyTorch框架构建了一个简单的卷积神经网络模型，用于CIFAR-10数据集的图像分类任务。

从最初，我们详细讲解了CNN的理论背景，包括卷积层、池化层、激活函数和全连接层的工作原理，并给大家提供了完整的代码实现和模型训练过程。最后，我们通过可视化训练过程中的损失、准确率以及卷积核和特征图，进一步揭示了CNN如何处理和学习图像数据。

本文章转载微信公众号@深夜努力写Python

快速突破PyTorch，构建CNN图像分类模型！！

理论背景

卷积神经网络（CNN）概述

实际项目

最后

讲透一个强大算法模型，K-means Clustering ！！

讲透一个强大算法模型，Ridge回归！！

我们有何不同？

热门场景实测，选对API

#AI文本生成大模型API

#AI深度推理大模型API