CNN 이미지 분류: 딥러닝의 핵심 기술

이미지 분류는 컴퓨터 비전 분야에서 가장 중요한 문제 중 하나로, 자율주행차의 장애물 인식, 의료 영상 분석, 소셜 미디어의 이미지 태그링 등 다양한 분야에 응용되고 있습니다. 특히, CNN(Convolutional Neural Networks, 합성곱 신경망)은 이미지 데이터 처리에서 탁월한 성능을 보여주며, 딥러닝 혁신의 중심에 서 있습니다. 이번 블로그 포스트에서는 CNN이 이미지 분류 문제를 어떻게 해결하는지에 대해 자세히 알아보겠습니다.

1. CNN의 역사와 발전 과정

초기 발견: 생물학적 영감 (1959-1980년대)

CNN의 기원은 1959년 Hubel과 Wiesel의 고양이 시각 피질 연구로 거슬러 올라갑니다. 이들은 시각 피질의 뉴런들이 특정 방향의 선에 반응하는 '국소 수용장(receptive field)'을 가진다는 것을 발견했습니다. 이 발견은 후에 CNN의 합성곱 연산의 기초가 되었습니다.

1980년 Kunihiko Fukushima는 이러한 생물학적 발견에 영감을 받아 Neocognitron을 제안했습니다. 이는 최초의 합성곱 구조를 가진 신경망으로, 계층적 패턴 인식의 개념을 도입했습니다.

현대 CNN의 탄생: LeNet (1989-1998)

1989년 Yann LeCun은 역전파(backpropagation) 알고리즘을 합성곱 신경망에 적용하여 최초로 실용적인 CNN을 개발했습니다. 1998년 발표된 LeNet-5는 우편번호 인식을 위해 설계되었으며, 우편 서비스에서 실제로 사용되었습니다.

LeNet-5의 구조:

입력: 32x32 흑백 이미지
C1: 6개의 5x5 합성곱 필터
S2: 2x2 평균 풀링
C3: 16개의 5x5 합성곱 필터
S4: 2x2 평균 풀링
C5: 120개의 5x5 합성곱 필터
F6: 84개의 완전 연결 계층
출력: 10개 클래스 (숫자 0-9)

딥러닝의 부흥: AlexNet (2012)

2012년 ImageNet Large Scale Visual Recognition Challenge (ILSVRC)에서 Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton이 개발한 AlexNet이 압도적인 성능 차이로 우승하면서 딥러닝 시대가 본격적으로 시작되었습니다.

AlexNet의 혁신:

더 깊은 구조 (8개 계층)
ReLU 활성화 함수 사용
Dropout을 통한 정규화
데이터 증강(Data Augmentation)
GPU를 활용한 병렬 학습
Local Response Normalization (LRN)

AlexNet은 Top-5 에러율을 15.3%로 낮추며, 2위인 26.2%와 큰 격차를 보였습니다. 이는 딥러닝이 컴퓨터 비전 분야의 표준이 되는 계기가 되었습니다.

발전의 가속화 (2013-현재)

VGGNet (2014): 3x3 작은 필터를 여러 층 쌓는 것이 효과적임을 보임
GoogLeNet/Inception (2014): Inception 모듈을 통한 효율적인 계산
ResNet (2015): Skip connection을 통해 152층 이상의 매우 깊은 네트워크 학습 가능
DenseNet (2017): Dense connection을 통한 특징 재사용
EfficientNet (2019): 깊이, 너비, 해상도의 균형있는 스케일링
Vision Transformer (2020): Transformer 구조를 이미지에 적용

2. CNN의 기본 개념과 수학적 원리

CNN은 이미지 데이터를 처리하는 데 특화된 딥러닝 모델로, 이미지의 공간적 계층 구조를 학습하는 데 매우 효과적입니다. CNN은 크게 세 가지 주요 계층으로 구성됩니다: 합성곱(Convolutional) 계층, 풀링(Pooling) 계층, 그리고 완전 연결(Fully Connected) 계층입니다.

합성곱 계층 (Convolutional Layer)

합성곱 계층은 CNN의 핵심으로, 이미지의 특징을 추출하는 역할을 합니다. 각 계층은 여러 개의 필터(Filter) 또는 커널(Kernel)을 사용하여 입력 이미지와 합성곱 연산을 수행합니다.

수학적 정의

2차원 합성곱 연산은 다음과 같이 정의됩니다:

S(i, j) = (I * K)(i, j) = \sum_m \sum_n I(i + m, j + n) \cdot K(m, n)

여기서:

$I$ 는 입력 이미지 (Input)
$K$ 는 필터 또는 커널 (Kernel)
$S$ 는 출력 특징 맵 (Feature Map)
$(i, j)$ 는 출력 특징 맵의 위치
$(m, n)$ 은 커널의 위치

합성곱의 특성

파라미터 공유 (Parameter Sharing): 동일한 필터가 이미지 전체에 적용되어 파라미터 수를 크게 감소시킵니다.
- 예: 200x200 이미지에 5x5 필터 적용 시
- 완전 연결: $(200 \times 200) \times (200 \times 200) = 1,600,000,000$ 파라미터
- 합성곱: $5 \times 5 = 25$ 파라미터
희소 연결 (Sparse Connectivity): 각 출력 뉴런은 입력의 작은 영역(receptive field)에만 연결됩니다.
평행 이동 불변성 (Translation Invariance): 객체의 위치가 변해도 동일한 특징을 검출할 수 있습니다.

주요 하이퍼파라미터

필터 크기 (Kernel Size): 일반적으로 3x3, 5x5, 7x7
- 작은 필터: 더 세밀한 특징 학습, 더 깊은 네트워크 가능
- 큰 필터: 더 넓은 수용장, 더 적은 계층
스트라이드 (Stride): 필터를 이동시키는 간격
- 스트라이드 1: 촘촘한 특징 추출
- 스트라이드 2 이상: 다운샘플링 효과
패딩 (Padding): 입력 주변에 값을 추가
- Valid 패딩: 패딩 없음, 출력 크기 감소
- Same 패딩: 출력 크기가 입력과 동일하도록 패딩

출력 크기 계산:

\text{Output Size} = \frac{W - K + 2P}{S} + 1

여기서:

$W$ : 입력 크기
$K$ : 커널 크기
$P$ : 패딩 크기
$S$ : 스트라이드

활성화 함수

합성곱 후에는 활성화 함수를 적용하여 비선형성을 도입합니다:

A(i, j) = f(S(i, j) + b)

주요 활성화 함수:

ReLU (Rectified Linear Unit):

\text{ReLU}(x) = \max(0, x)

가장 널리 사용됨
계산 효율적
Dying ReLU 문제 존재

Leaky ReLU:

\text{LeakyReLU}(x) = \max(0.01x, x)

Dying ReLU 문제 완화

ELU (Exponential Linear Unit):

\text{ELU}(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha(e^x - 1) & \text{if } x \leq 0 \end{cases}

풀링 계층 (Pooling Layer)

풀링 계층은 특징 맵의 크기를 줄여 계산량을 감소시키고, 모델의 과적합을 방지하며, 작은 변화에 대한 불변성을 제공합니다.

최대 풀링 (Max Pooling)

특정 영역 내에서 가장 큰 값을 선택:

\text{MaxPool}(R) = \max_{(i,j) \in R} A(i, j)

가장 강한 활성화를 보존
가장 널리 사용되는 풀링 방식
전형적으로 2x2 커널, 스트라이드 2 사용

평균 풀링 (Average Pooling)

특정 영역의 평균값을 계산:

\text{AvgPool}(R) = \frac{1}{|R|} \sum_{(i,j) \in R} A(i, j)

전체적인 특징을 부드럽게 표현
Global Average Pooling은 완전 연결 계층을 대체하는 데 사용

글로벌 풀링 (Global Pooling)

전체 특징 맵을 하나의 값으로 축소:

Global Average Pooling (GAP): 파라미터 수 크게 감소
Global Max Pooling (GMP): 가장 강한 특징 선택

완전 연결 계층 (Fully Connected Layer)

완전 연결 계층은 네트워크의 최종 출력 부분을 구성하며, 추출된 특징을 기반으로 최종 분류를 수행합니다.

y = f(W \cdot x + b)

여기서:

$x$ : 입력 벡터 (flatten된 특징 맵)
$W$ : 가중치 행렬
$b$ : 편향 벡터
$f$ : 활성화 함수 (출력층에서는 보통 Softmax)

Softmax 함수

다중 클래스 분류를 위한 출력층 활성화 함수:

\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}

모든 출력의 합이 1이 되어 확률 분포를 나타냄
$K$ 는 클래스의 개수

배치 정규화 (Batch Normalization)

각 미니배치의 평균과 분산을 정규화하여 학습을 안정화:

\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}

y = \gamma \hat{x} + \beta

여기서:

$\mu_B$ : 배치 평균
$\sigma_B^2$ : 배치 분산
$\epsilon$ : 수치 안정성을 위한 작은 상수
$\gamma, \beta$ : 학습 가능한 파라미터

이점:

학습 속도 향상
더 높은 학습률 사용 가능
초기화에 덜 민감
정규화 효과

드롭아웃 (Dropout)

과적합을 방지하기 위한 정규화 기법:

r_i \sim \text{Bernoulli}(p)

\tilde{y} = r \odot y

학습 중 무작위로 일부 뉴런을 비활성화
앙상블 효과
일반적으로 $p = 0.5$ 사용

3. 주요 CNN 아키텍처

LeNet-5 (1998)

최초의 성공적인 CNN 아키텍처:

import torch
import torch.nn as nn

class LeNet5(nn.Module):
    def __init__(self, num_classes=10):
        super(LeNet5, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=2),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(6, 16, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),
            nn.Tanh(),
            nn.Linear(120, 84),
            nn.Tanh(),
            nn.Linear(84, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

AlexNet (2012)

딥러닝 혁명의 시작:

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

VGGNet (2014)

작은 필터의 반복적 사용:

class VGG16(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG16, self).__init__()
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 3
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 4
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 5
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

VGG의 주요 특징:

3x3 필터만 사용
깊이가 중요함을 증명 (16-19 계층)
간단하고 균일한 구조
파라미터 수가 매우 많음 (138M)

ResNet (2015)

Residual learning을 통한 매우 깊은 네트워크:

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # Skip connection
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        identity = self.shortcut(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity  # Skip connection
        out = self.relu(out)

        return out

class ResNet18(nn.Module):
    def __init__(self, num_classes=1000):
        super(ResNet18, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # ResNet blocks
        self.layer1 = self._make_layer(64, 64, 2, stride=1)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        self.layer4 = self._make_layer(256, 512, 2, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    def _make_layer(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels, 1))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

ResNet의 핵심 아이디어:

y = F(x) + x

Skip connection이 gradient flow를 개선
Vanishing gradient 문제 해결
152층 이상의 매우 깊은 네트워크 학습 가능
Identity mapping 학습

Inception (GoogLeNet) (2014)

다양한 크기의 필터를 병렬로 사용:

class InceptionModule(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(InceptionModule, self).__init__()

        # 1x1 convolution branch
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, ch1x1, kernel_size=1),
            nn.ReLU(inplace=True)
        )

        # 1x1 -> 3x3 convolution branch
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),
            nn.ReLU(inplace=True)
        )

        # 1x1 -> 5x5 convolution branch
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),
            nn.ReLU(inplace=True)
        )

        # 3x3 pooling -> 1x1 convolution branch
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)

Inception의 특징:

여러 스케일의 특징을 동시에 학습
1x1 convolution으로 차원 축소
계산 효율성 향상
22층 구조

4. PyTorch로 CNN 구현하기

기본 CNN 모델 구현

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import numpy as np

# 간단한 CNN 모델
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32,
                               kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)

        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)

        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128,
                               kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)

        # Pooling
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Dropout
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)

        # Fully connected layers
        self.fc1 = nn.Linear(128 * 3 * 3, 256)
        self.fc2 = nn.Linear(256, num_classes)

    def forward(self, x):
        # Conv block 1
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.pool(x)

        # Conv block 2
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.pool(x)

        # Conv block 3
        x = self.conv3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = self.pool(x)

        x = self.dropout1(x)

        # Flatten
        x = torch.flatten(x, 1)

        # FC layers
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)

        return x

# 모델 초기화
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleCNN(num_classes=10).to(device)

# 모델 요약
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total trainable parameters: {count_parameters(model):,}")

데이터 준비 및 전처리

# 데이터 증강 및 정규화
train_transform = transforms.Compose([
    transforms.RandomRotation(10),  # 랜덤 회전
    transforms.RandomAffine(0, translate=(0.1, 0.1)),  # 랜덤 이동
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST 평균과 표준편차
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# 데이터셋 로드
train_dataset = datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=train_transform
)

test_dataset = datasets.MNIST(
    root='./data',
    train=False,
    download=True,
    transform=test_transform
)

# 데이터 로더 생성
batch_size = 128
train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=4,
    pin_memory=True  # GPU 메모리 고정
)

test_loader = DataLoader(
    test_dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=4,
    pin_memory=True
)

학습 루프 구현

# 손실 함수와 옵티마이저
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5, verbose=True
)

# 학습 함수
def train_epoch(model, device, train_loader, optimizer, criterion, epoch):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

        # Forward pass
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)

        # Backward pass
        loss.backward()
        optimizer.step()

        # 통계
        running_loss += loss.item()
        _, predicted = output.max(1)
        total += target.size(0)
        correct += predicted.eq(target).sum().item()

        # 진행상황 출력
        if batch_idx % 100 == 0:
            print(f'Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\\t'
                  f'Loss: {loss.item():.6f}')

    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100. * correct / total

    return epoch_loss, epoch_acc

# 검증 함수
def validate(model, device, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()

            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()

    test_loss /= len(test_loader)
    test_acc = 100. * correct / total

    print(f'\\nTest set: Average loss: {test_loss:.4f}, '
          f'Accuracy: {correct}/{total} ({test_acc:.2f}%)\\n')

    return test_loss, test_acc

# 전체 학습 프로세스
num_epochs = 30
best_acc = 0.0

train_losses = []
train_accs = []
test_losses = []
test_accs = []

for epoch in range(1, num_epochs + 1):
    # 학습
    train_loss, train_acc = train_epoch(
        model, device, train_loader, optimizer, criterion, epoch
    )
    train_losses.append(train_loss)
    train_accs.append(train_acc)

    # 검증
    test_loss, test_acc = validate(model, device, test_loader, criterion)
    test_losses.append(test_loss)
    test_accs.append(test_acc)

    # Learning rate 조정
    scheduler.step(test_loss)

    # 최고 모델 저장
    if test_acc > best_acc:
        best_acc = test_acc
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'test_acc': test_acc,
            'test_loss': test_loss,
        }, 'best_model.pth')
        print(f'Saved best model with accuracy: {best_acc:.2f}%')

print(f'Best Test Accuracy: {best_acc:.2f}%')

학습 결과 시각화

import matplotlib.pyplot as plt

# 손실 및 정확도 그래프
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# 손실 그래프
ax1.plot(train_losses, label='Train Loss')
ax1.plot(test_losses, label='Test Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Test Loss')
ax1.legend()
ax1.grid(True)

# 정확도 그래프
ax2.plot(train_accs, label='Train Accuracy')
ax2.plot(test_accs, label='Test Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.set_title('Training and Test Accuracy')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.savefig('training_history.png')
plt.show()

모델 예측 및 평가

# 혼동 행렬 (Confusion Matrix)
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

def evaluate_model(model, device, test_loader):
    model.eval()
    all_preds = []
    all_targets = []

    with torch.no_grad():
        for data, target in test_loader:
            data = data.to(device)
            output = model(data)
            _, predicted = output.max(1)

            all_preds.extend(predicted.cpu().numpy())
            all_targets.extend(target.numpy())

    # 혼동 행렬
    cm = confusion_matrix(all_targets, all_preds)

    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.savefig('confusion_matrix.png')
    plt.show()

    # 분류 리포트
    print(classification_report(all_targets, all_preds,
                                target_names=[str(i) for i in range(10)]))

# 모델 로드
checkpoint = torch.load('best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])

# 평가
evaluate_model(model, device, test_loader)

단일 이미지 예측

def predict_image(model, image_path, device, transform):
    from PIL import Image

    # 이미지 로드 및 전처리
    image = Image.open(image_path).convert('L')  # 흑백 변환
    image = transform(image).unsqueeze(0).to(device)

    model.eval()
    with torch.no_grad():
        output = model(image)
        probabilities = F.softmax(output, dim=1)
        predicted_class = output.argmax(1).item()
        confidence = probabilities[0][predicted_class].item()

    print(f"Predicted class: {predicted_class}")
    print(f"Confidence: {confidence * 100:.2f}%")

    # 모든 클래스의 확률 출력
    print("\\nAll class probabilities:")
    for i, prob in enumerate(probabilities[0]):
        print(f"Class {i}: {prob.item() * 100:.2f}%")

    return predicted_class, confidence

# 사용 예제
# predicted_class, confidence = predict_image(model, 'test_image.png', device, test_transform)

5. 전이학습 활용하기

전이학습(Transfer Learning)은 사전 학습된 모델을 새로운 작업에 적용하는 기법으로, 적은 데이터와 계산 자원으로도 높은 성능을 달성할 수 있습니다.

전이학습의 장점

학습 시간 단축: 처음부터 학습할 필요 없음
적은 데이터로 학습 가능: 사전 학습된 특징 활용
높은 성능: 대규모 데이터셋(ImageNet)에서 학습된 특징 활용
과적합 방지: 더 나은 일반화 성능

전이학습 전략

특징 추출기로 사용: 사전 학습된 가중치를 고정하고 최종 분류기만 학습
미세 조정(Fine-tuning): 일부 또는 전체 레이어를 낮은 학습률로 재학습

PyTorch로 전이학습 구현

import torchvision.models as models

# 1. 사전 학습된 ResNet18 로드
model = models.resnet18(pretrained=True)

# 모델 구조 확인
print(model)

# 2. 특징 추출기로 사용 (모든 파라미터 고정)
for param in model.parameters():
    param.requires_grad = False

# 3. 최종 분류기만 교체
num_features = model.fc.in_features
num_classes = 10  # 새로운 작업의 클래스 수

model.fc = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(512, num_classes)
)

# 새로 추가된 레이어만 학습됨
model = model.to(device)

# 4. 옵티마이저 설정 (새 레이어만 학습)
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

미세 조정 (Fine-tuning)

# 1. 전체 모델 로드
model = models.resnet18(pretrained=True)

# 2. 최종 분류기 교체
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, num_classes)
model = model.to(device)

# 3. 차별적 학습률 설정
# - 사전 학습된 레이어: 낮은 학습률
# - 새로운 레이어: 높은 학습률
optimizer = optim.Adam([
    {'params': model.layer4.parameters(), 'lr': 1e-4},
    {'params': model.layer3.parameters(), 'lr': 1e-5},
    {'params': model.fc.parameters(), 'lr': 1e-3}
])

# 4. 점진적 언프리징
def unfreeze_layers(model, num_layers_to_unfreeze):
    # 먼저 모든 파라미터 고정
    for param in model.parameters():
        param.requires_grad = False

    # 마지막 몇 개 레이어만 학습 가능하게 설정
    layers = [model.layer4, model.layer3, model.layer2, model.layer1]
    for i in range(num_layers_to_unfreeze):
        if i < len(layers):
            for param in layers[i].parameters():
                param.requires_grad = True

    # 분류기는 항상 학습
    for param in model.fc.parameters():
        param.requires_grad = True

# 처음에는 분류기만 학습
unfreeze_layers(model, 0)
# 학습 후 점진적으로 더 많은 레이어 언프리징

다양한 사전 학습 모델 활용

# ResNet 계열
resnet50 = models.resnet50(pretrained=True)
resnet101 = models.resnet101(pretrained=True)

# VGG 계열
vgg16 = models.vgg16(pretrained=True)
vgg19 = models.vgg19(pretrained=True)

# DenseNet 계열
densenet121 = models.densenet121(pretrained=True)
densenet169 = models.densenet169(pretrained=True)

# MobileNet (경량 모델)
mobilenet_v2 = models.mobilenet_v2(pretrained=True)

# EfficientNet
from torchvision.models import efficientnet_b0
efficientnet = efficientnet_b0(pretrained=True)

# 각 모델에 맞게 분류기 수정
def modify_classifier(model, num_classes, model_type='resnet'):
    if model_type == 'resnet':
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    elif model_type == 'vgg':
        num_features = model.classifier[6].in_features
        model.classifier[6] = nn.Linear(num_features, num_classes)
    elif model_type == 'densenet':
        num_features = model.classifier.in_features
        model.classifier = nn.Linear(num_features, num_classes)
    elif model_type == 'mobilenet':
        num_features = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(num_features, num_classes)

    return model

커스텀 데이터셋으로 전이학습

from torch.utils.data import Dataset
from PIL import Image
import os

class CustomImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.classes = sorted(os.listdir(root_dir))
        self.class_to_idx = {cls_name: i for i, cls_name in enumerate(self.classes)}

        self.images = []
        self.labels = []

        for class_name in self.classes:
            class_dir = os.path.join(root_dir, class_name)
            for img_name in os.listdir(class_dir):
                if img_name.endswith(('.jpg', '.jpeg', '.png')):
                    self.images.append(os.path.join(class_dir, img_name))
                    self.labels.append(self.class_to_idx[class_name])

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img_path = self.images[idx]
        image = Image.open(img_path).convert('RGB')
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label

# ImageNet 정규화 값 사용
imagenet_mean = [0.485, 0.456, 0.406]
imagenet_std = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=imagenet_mean, std=imagenet_std)
])

test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=imagenet_mean, std=imagenet_std)
])

# 데이터셋 생성
# train_dataset = CustomImageDataset('path/to/train', transform=train_transform)
# test_dataset = CustomImageDataset('path/to/test', transform=test_transform)

6. 실전 팁과 Best Practices

데이터 증강 (Data Augmentation)

# 고급 데이터 증강 기법
advanced_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.2),
    transforms.RandomRotation(degrees=30),
    transforms.RandomAffine(
        degrees=0,
        translate=(0.1, 0.1),
        scale=(0.9, 1.1),
        shear=10
    ),
    transforms.ColorJitter(
        brightness=0.3,
        contrast=0.3,
        saturation=0.3,
        hue=0.1
    ),
    transforms.RandomGrayscale(p=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=imagenet_mean, std=imagenet_std),
    transforms.RandomErasing(p=0.2)  # Cutout
])

가중치 초기화

def init_weights(m):
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        if m.bias is not None:
            nn.init.constant_(m.bias, 0)
    elif isinstance(m, nn.BatchNorm2d):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)
    elif isinstance(m, nn.Linear):
        nn.init.normal_(m.weight, 0, 0.01)
        if m.bias is not None:
            nn.init.constant_(m.bias, 0)

# 모델에 적용
model.apply(init_weights)

Learning Rate Scheduling

# 1. Step LR: 일정 epoch마다 학습률 감소
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# 2. MultiStep LR: 지정된 epoch에서 학습률 감소
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)

# 3. Exponential LR: 지수적으로 학습률 감소
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)

# 4. Cosine Annealing: 코사인 함수로 학습률 조정
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50, eta_min=1e-6)

# 5. Reduce on Plateau: 성능이 개선되지 않을 때 학습률 감소
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5, verbose=True
)

# 6. One Cycle LR: Super-convergence
scheduler = optim.lr_scheduler.OneCycleLR(
    optimizer, max_lr=0.1, steps_per_epoch=len(train_loader), epochs=num_epochs
)

손실 함수 선택

# 1. Cross Entropy Loss (기본)
criterion = nn.CrossEntropyLoss()

# 2. 가중치가 있는 Cross Entropy (불균형 데이터셋)
class_weights = torch.tensor([1.0, 2.0, 1.5, ...])  # 클래스별 가중치
criterion = nn.CrossEntropyLoss(weight=class_weights.to(device))

# 3. Label Smoothing
class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, epsilon=0.1):
        super().__init__()
        self.epsilon = epsilon

    def forward(self, output, target):
        n_classes = output.size(-1)
        log_probs = F.log_softmax(output, dim=-1)
        loss = -log_probs.sum(dim=-1).mean()
        nll = F.nll_loss(log_probs, target)
        return (1 - self.epsilon) * nll + self.epsilon * loss / n_classes

criterion = LabelSmoothingCrossEntropy(epsilon=0.1)

# 4. Focal Loss (어려운 샘플에 집중)
class FocalLoss(nn.Module):
    def __init__(self, alpha=1, gamma=2):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, inputs, targets):
        ce_loss = F.cross_entropy(inputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss
        return focal_loss.mean()

criterion = FocalLoss(alpha=1, gamma=2)

모델 앙상블

class EnsembleModel(nn.Module):
    def __init__(self, models):
        super().__init__()
        self.models = nn.ModuleList(models)

    def forward(self, x):
        outputs = [model(x) for model in self.models]
        # 평균 앙상블
        avg_output = torch.mean(torch.stack(outputs), dim=0)
        return avg_output

# 여러 모델 생성
model1 = SimpleCNN(num_classes=10)
model2 = SimpleCNN(num_classes=10)
model3 = SimpleCNN(num_classes=10)

# 앙상블 모델 생성
ensemble = EnsembleModel([model1, model2, model3])

그래디언트 클리핑

# 그래디언트 폭발 방지
max_grad_norm = 1.0

for epoch in range(num_epochs):
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()

        # 그래디언트 클리핑
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)

        optimizer.step()

조기 종료 (Early Stopping)

class EarlyStopping:
    def __init__(self, patience=7, min_delta=0):
        self.patience = patience
        self.min_delta = min_delta
        self.counter = 0
        self.best_loss = None
        self.early_stop = False

    def __call__(self, val_loss):
        if self.best_loss is None:
            self.best_loss = val_loss
        elif val_loss > self.best_loss - self.min_delta:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_loss = val_loss
            self.counter = 0

# 사용 예제
early_stopping = EarlyStopping(patience=10, min_delta=0.001)

for epoch in range(num_epochs):
    train_loss, train_acc = train_epoch(model, device, train_loader, optimizer, criterion, epoch)
    val_loss, val_acc = validate(model, device, test_loader, criterion)

    early_stopping(val_loss)
    if early_stopping.early_stop:
        print(f"Early stopping at epoch {epoch}")
        break

Mixed Precision Training

from torch.cuda.amp import autocast, GradScaler

# Scaler 생성
scaler = GradScaler()

for epoch in range(num_epochs):
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)

        optimizer.zero_grad()

        # Mixed precision으로 forward pass
        with autocast():
            output = model(data)
            loss = criterion(output, target)

        # Backward pass with scaling
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

모델 체크포인트 관리

def save_checkpoint(model, optimizer, epoch, loss, accuracy, filename):
    checkpoint = {
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': loss,
        'accuracy': accuracy,
    }
    torch.save(checkpoint, filename)
    print(f"Checkpoint saved: {filename}")

def load_checkpoint(model, optimizer, filename):
    checkpoint = torch.load(filename)
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    epoch = checkpoint['epoch']
    loss = checkpoint['loss']
    accuracy = checkpoint['accuracy']

    print(f"Checkpoint loaded: epoch {epoch}, loss {loss:.4f}, accuracy {accuracy:.2f}%")
    return epoch, loss, accuracy

# 정기적으로 체크포인트 저장
if epoch % 5 == 0:
    save_checkpoint(model, optimizer, epoch, train_loss, train_acc,
                   f'checkpoint_epoch_{epoch}.pth')

성능 최적화 팁

배치 크기 최적화: GPU 메모리를 최대한 활용하되 메모리 부족 방지
num_workers 조정: 데이터 로딩 속도 향상 (일반적으로 CPU 코어 수)
pin_memory 사용: GPU 전송 속도 향상
JIT 컴파일: torch.jit.script로 모델 최적화
Gradient Accumulation: 메모리가 부족할 때 큰 배치 크기 효과

# Gradient Accumulation 예제
accumulation_steps = 4

for epoch in range(num_epochs):
    for i, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

        output = model(data)
        loss = criterion(output, target)
        loss = loss / accumulation_steps  # 손실 스케일링

        loss.backward()

        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()

7. 결론

CNN은 이미지 분류 문제에서 매우 강력한 도구이며, 다양한 산업 분야에 걸쳐 광범위하게 사용되고 있습니다. 이번 포스트에서는 CNN의 역사와 발전 과정, 기본 개념과 수학적 원리, 주요 아키텍처들, PyTorch를 이용한 구현, 전이학습 활용법, 그리고 실전 팁과 Best Practices를 다루었습니다.

핵심 요약

CNN의 발전: LeNet부터 현대의 ResNet, EfficientNet까지 지속적인 혁신
수학적 기초: 합성곱 연산, 풀링, 활성화 함수의 이해가 중요
아키텍처 선택: 문제에 맞는 적절한 아키텍처 선택이 성공의 열쇠
전이학습: 제한된 데이터와 자원으로도 높은 성능 달성 가능
실전 기법: 데이터 증강, 정규화, 최적화 기법의 적절한 활용

다음 학습 방향

CNN의 세계는 매우 넓고 깊으며, 다양한 변형 및 고급 기술들이 존재합니다:

객체 탐지: YOLO, R-CNN, SSD 등
의미론적 분할: U-Net, DeepLab, Mask R-CNN 등
생성 모델: GAN, VAE, Diffusion Models
Attention 메커니즘: Self-Attention, Vision Transformers
경량화: MobileNet, ShuffleNet, Knowledge Distillation
자기지도 학습: SimCLR, MoCo, BYOL

추가 학습 자료

온라인 강의

문서 및 튜토리얼

추천 논문

커뮤니티 및 포럼

CNN을 활용한 이미지 분류의 세계는 무궁무진하며, 여러분의 연구와 프로젝트에 많은 도움이 될 것입니다. 이론적 이해와 실전 경험을 균형있게 쌓으며, 지속적으로 최신 연구 동향을 따라가는 것이 중요합니다. 더 깊이 있는 연구와 탐구를 통해 더욱 발전된 모델을 만들어 보세요!