在目标检测领域,YOLO系列算法以其高效性和准确性备受关注。然而,对于小目标检测,YOLO算法仍存在一定的局限性。小目标在图像中占比小、特征不明显,容易被忽略或误判。本文将介绍如何通过增加小目标检测头和添加CBAM注意力机制来改进YOLOv11,提升其对小目标的检测能力。

添加小目标检测头

原始的YOLO11输出层只有P5、P4、P3三层。对输入的 640 × 640 的图像进行了三次下采样,最终输出的三个特征图的大小依次为 80 × 80(P3)、40 × 40(P4) 和 20 × 20(P5)。特征图的大小与输入图像各网格单元对应区域大小成反比关系,所以从 P3 到 P5,对小目标的检测能力依次下降。为了解决由于多层卷积导致的特征信息淹没问题,可以在原有网络的基础上增加新的检测头P2。

首先找到ultralytics-main/ultralytics/cfg/models/11目录下的yolo11.yaml文件,复制一份并改个名字,例如yolo11-improved.yaml。在该yaml文件文件中,删除原有内容,并复制以下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P2-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024]
s: [0.50, 0.50, 1024]
m: [0.50, 1.00, 512]
l: [1.00, 1.00, 512]
x: [1.00, 1.50, 512]

# YOLO11-P2 backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10

# YOLO11-P2 head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13

- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 2], 1, Concat, [1]] # cat backbone P2
- [-1, 2, C3k2, [128, False]] # 19 (P2/4-xsmall)

- [-1, 1, Conv, [128, 3, 2]]
- [[-1, 16], 1, Concat, [1]] # cat head P3
- [-1, 2, C3k2, [256, False]] # 22 (P3/8-small)

- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 25 (P4/16-medium)

- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 28 (P5/32-large)

- [[19, 22, 25, 28], 1, Detect, [nc]] # Detect(P2, P3, P4, P5)

添加CBAM模块

添加CBAM代码

在ultralytics/nn/modules目录下新建文件CBAM.py,复制粘贴以下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import torch
import torch.nn as nn

class ChannelAttention(nn.Module):

def __init__(self, channels: int) -> None:
super().__init__()
self.avg = nn.AdaptiveAvgPool2d(1)
self.max = nn.AdaptiveMaxPool2d(1)
self.fc1 = nn.Conv2d(channels, channels//16, 1, 1, 0, bias=True)
self.fc2 = nn.Conv2d(channels//16, channels, 1, 1, 0, bias=True)
self.relu=nn.ReLU()
self.sigmoid = nn.Sigmoid()

def forward(self, x: torch.Tensor) -> torch.Tensor:
avg_out=self.fc2(self.relu(self.fc1(self.avg(x))))
max_out=self.fc2(self.relu(self.fc1(self.max(x))))

out=avg_out+max_out
return self.sigmoid(out)

class SpatialAttention(nn.Module):

def __init__(self, kernel_size=7):
super().__init__()
assert kernel_size in {3, 7}, "kernel size must be 3 or 7"
padding = 3 if kernel_size == 7 else 1
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()

def forward(self, x):
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avg_out, max_out], dim=1)
x = self.conv1(x)
return self.sigmoid(x)

class myCBAM(nn.Module):
def __init__(self, c1, kernel_size=7):
"""Initialize CBAM with given input channel (c1) and kernel size."""
super().__init__()
self.ca = ChannelAttention(c1)
self.sa = SpatialAttention(kernel_size)
self.relu= nn.ReLU()

def forward(self, x):
out=self.ca(x)*x
out=self.sa(out)*out
out=self.relu(out)
return out

修改task.py文件

在/ultralytics/nn目录下找到task.py文件。先导入刚刚创建的CBAM.py

1
from ultralytics.nn.modules.CBAM import myCBAM #导入CBAM模块

然后找到parse_model模块在该模块中添加:

1
2
elif m in {myCBAM}:
args=[ch[f],*args]

修改yaml文件

回到刚刚创建的yolo11-improved.yaml文件,在head模块中,只需在想使用CBAM的位置添加即可,例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# YOLO11-P2 head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, myCBAM, []]

- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, myCBAM, []]

- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 2], 1, Concat, [1]] # cat backbone P2
- [-1, 2, C3k2, [128, False]] # 19 (P2/4-xsmall)
- [-1, 1, myCBAM, []]

- [-1, 1, Conv, [128, 3, 2]]
- [[-1, 16], 1, Concat, [1]] # cat head P3
- [-1, 2, C3k2, [256, False]] # 22 (P3/8-small)
- [-1, 1, myCBAM, []]

- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 25 (P4/16-medium)
- [-1, 1, myCBAM, []]

- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 28 (P5/32-large)

- [[19, 22, 25, 28], 1, Detect, [nc]] # Detect(P2, P3, P4, P5)

这里,我在特征融合部分共插入了5个 CBAM 模块,分别位于每
一个 C3k2 模块之后。

改进结果

原始YOLO11检测结果

改进后的YOLO11检测结果