一种基于红外-声注意力的变电站多模态巡检技术
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国网四川省电力公司科技项目(521997240003)


A Multimodal Inspection Technology for Substations Based on Infrared-Acoustic Attention Fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    变电站设备的发热缺陷在不同模态中呈现互补特征,融合多源信息可显著提升检出性能。但现有多模态模型普遍参数量大,难以在边缘端部署。因此,面向边缘计算约束与单模态信息不足的双重挑战,提出了一种经过改进的轻量化红外-声音多模态缺陷巡检方法。在视觉处理上,增强对电力设备等小目标的特征提取能力,在YOLOv5s的骨干网络中嵌入了轻量化的卷积块注意力模块,并结合MobileNetV3进行深度发热特征提取;在声学分析上,针对缺陷信号的瞬时特性,在经典的卷积循环神经网络-长短期记忆网格结构后引入了时序自注意力机制,捕捉关键的声学事件;最后,通过一个自适应注意力网络对两种经过增强的模态特征进行动态加权融合与协同判别,进一步提升了对早期或隐蔽发热缺陷的敏感性。通过自建变电站数据集验证表明,所提方法较单模态与各类基线模型在各项指标上均取得最优性能,准确率达到96.50%,F1分数达到94.20%。同时,模型参数量约12.3 MB,单次推理时间约20 ms,在性能显著提升的同时,依然满足边缘侧实时部署的严苛要求。

    Abstract:

    Overheating defects in substation equipment exhibit complementary characteristics across different modalities, and the fusion of multi-source information can significantly improve detection performance. However, the existing multimodal models generally have large parameter sizes, making them difficult to deploy on edge devices. To address the dual challenges of edge computing constraints and insufficient information from single modalities, an improved lightweight infrared-acoustic multimodal defect inspection method is proposed. For visual processing, the feature extraction capability for small targets like power equipment is to be enhanced, and a lightweight convolutional block attention module (CBAM) is embedded into the backbone of YOLOv5s. For acoustic analysis, to address the transient nature of defect signals, a temporal self-attention mechanism is introduced after the classic CRNN-LSTM structure to precisely capture key acoustic events. Finally, these enhanced modal features are dynamically weighted and jointly discriminated by an attention-based adaptive fusion network, thereby improving the sensitivity to early or hidden overheating defects. Validation on a self-built substation dataset indicates that the proposed method can achieve optimal performance across all metrics compared to single modalities and various baseline models, reaching an accuracy of 96.50% and an F1-score of 94.20%. Meanwhile, with a model size of approximately 12.3 MB and a single inference time of around 20 ms, the performance is significantly improved while still meeting the stringent real-time deployment requirements for edge-side applications.

    参考文献
    相似文献
    引证文献
引用本文

张凌浩,邝俊威,滕予非,向思屿,李 林,周颖婕.一种基于红外-声注意力的变电站多模态巡检技术[J].四川电力技术,2026,49(2):78-83.
ZHANG Linghao, KUANG Junwei, TENG Yufei, XIANG Siyu, LI Lin, ZHOU Yingjie. A Multimodal Inspection Technology for Substations Based on Infrared-Acoustic Attention Fusion[J]. SICHUAN ELECTRIC POWER TECHNOLOGY,2026,49(2):78-83.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-09
  • 出版日期:
文章二维码