高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于异构FPGA的目标检测硬件加速器架构设计

夏琪迪 颜秉勇 周家乐 王慧锋

夏琪迪, 颜秉勇, 周家乐, 王慧锋. 基于异构FPGA的目标检测硬件加速器架构设计[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201027003
引用本文: 夏琪迪, 颜秉勇, 周家乐, 王慧锋. 基于异构FPGA的目标检测硬件加速器架构设计[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201027003
XIA Qidi, YAN Bingyong, ZHOU Jiale, WANG Huifeng. Architecture Design of Target Detection Hardware Accelerator Based on Heterogeneous FPGA[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201027003
Citation: XIA Qidi, YAN Bingyong, ZHOU Jiale, WANG Huifeng. Architecture Design of Target Detection Hardware Accelerator Based on Heterogeneous FPGA[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201027003

基于异构FPGA的目标检测硬件加速器架构设计

doi: 10.14135/j.cnki.1006-3080.20201027003
基金项目: 国家重点研发计划(2018YFC1803306);青年科学基金项目(61906068)
详细信息
    作者简介:

    夏琪迪(1998—),男,安徽亳州人,硕士生,主要研究方向为FPGA、嵌入式系统。E-mail:2389240877@qq.com

    通讯作者:

    颜秉勇,E-mail:byyan@ecust.edu.cn

  • 中图分类号: TP183; TP368.2

Architecture Design of Target Detection Hardware Accelerator Based on Heterogeneous FPGA

  • 摘要: 采用粗细粒度优化、参数定点化与重排序等多种硬件加速方法,基于FPGA+SOC异构平台提出了一种低功耗目标检测加速器架构。在Zynq 7000系列FPGA上针对现有研究的设计局限性,对YOLOv2算法进行新型多维度硬件加速,并对加速器性能和资源耗费进行深入分析建模,验证架构合理性;为充分利用片上硬件资源对各个模块进行特定优化设计,针对被忽视的底层繁琐数据访问,改进加速器数据访存机制,有效减少了系统传输时延。实验结果表明,该架构在PYNQ-Z2平台上获得了26.98 GOPs的性能,比现有的基于FPGA的目标检测平台提高了约38.71%,功耗仅为2.96 W,对目标检测算法的实际应用具有深远意义。

     

  • 图  1  YOLOv2网络结构

    Figure  1.  YOLOv2 network structure

    图  2  加速器数据流框架

    Figure  2.  Accelerator data flow framework

    图  3  粗粒度优化示意图

    Figure  3.  Schematic diagram of coarse-grained optimization

    图  4  优化前后FPGA中的乘加操作对比

    Figure  4.  Comparison of multiplication and addition operations in FPGA before and after optimization

    图  5  32位浮点数和16位定点数表示对比

    Figure  5.  Comparison between 32-bit floating-point numbers and 16-bit fixed-point numbers

    图  6  卷积模块展开示意图

    Figure  6.  Schematic diagram of convolution module expansion

    图  7  池化模块示意图

    Figure  7.  Schematic diagram of pooling module

    图  8  重排序示意图

    Figure  8.  Schematic diagram of reordering

    图  9  系统硬件加速器架构

    Figure  9.  System hardware accelerator architecture

    图  10  YOLOv2网络硬件加速系统

    Figure  10.  YOLOv2 network hardware acceleration system

    图  11  实验环境与检测结果

    Figure  11.  Experimental environment and test results

    表  1  不同数据精度消耗资源对比

    Table  1.   Comparison of resource consumption with different data accuracy

    Consume resources (Data precision)DSPLUT
    Adders(Float-32)2214
    Multiplier(Float-32)3135
    Adders(Fixed-16)-47
    Multiplier(Fixed-16)1101
    下载: 导出CSV

    表  2  加速器资源消耗

    Table  2.   Accelerator resource consumption

    ResouresUsedAvailableUtilization/%
    LUTs35 97753 20067.62
    FF32 049106 40030.12
    BRAM_18K17828063.57
    DSP48E15222069.09
    下载: 导出CSV

    表  3  与其他FPGA加速器设计的比较

    Table  3.   Comparison with other FPGA accelerator designs

    AcceleratorCNN ModelPlatformFrequency/ MHzBRAMDSPPower/WPerformance /GOPs
    Ref[22]YOLOv1ZC706200N/A8002.1718.82
    Ref[13]LW YOLOv2Zynq Ultra30017063774.5N/A
    Ref[23]YOLOv2 TinyCyclone V117N/A122N/A19.45
    This WorkYOLOv2PYNQ-Z21501781522.9626.98
    下载: 导出CSV
  • [1] SOKOLOVA A D, SAVCHENKO A V. Computation-efficient face recognition algorithm using a sequential analysis of high dimensional neural-net features[J]. Optical Memory and Neural Networks, 2020, 29(1): 19-29. doi: 10.3103/S1060992X2001004X
    [2] OH K, KIM S, OH I S. Salient explanation for fine-grained classification[J]. IEEE Access, 2020, 8: 6143361441.
    [3] YANG R, SINGH S K, TAVAKKOLI M, et al. CNN-LSTM deep learning architecture for computer vision-based modal frequency detection[J]. Mechanical Systems and Signal Processing, 2020, 144: 106885. doi: 10.1016/j.ymssp.2020.106885
    [4] WANG D, XU K, JIA Q, et al. ABM-SpConv: A novel approach to FPGA-based acceleration of convolutional neural network inference[C]//Proceedings of the 56th Annual Design Automation Conference. Las Vegas, USA: ACM, 2019: 1-6.
    [5] CHEN X Z, KUNDU K, ZHU Y. 3D Object proposals for accurate object class detection[C]//International Conference on Neural Information Processing Systems. USA: MIT Press, 2015: 424-432.
    [6] DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[J]. ACM Sigarch Computer Architecture News, 2015, 43(3): 92-104.
    [7] ZHANG C, LI P, SUN G G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. USA: ACM, 2015: 161-170.
    [8] LI H, FAN X, JIAO L, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks[C]//2016 26th International Conference on Field Programmable Logic and Applications (FPL). Switzerland: IEEE, 2016: 1-9.
    [9] QIU J, WANG J, YAO S, et al. Going deeper with embedded fpga platform for convolutional neural network[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. USA: ACM, 2016: 26-35.
    [10] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016: 779-788.
    [11] PEEMEN M, SETIO A, MESMAN B, et al. Memory-centric accelerator design for convolutional neural networks[C]// IEEE International Conference on Computer Design. USA: IEEE, 2013: 13-19.
    [12] NGUYEN D, NGUYEN T, KIM H, et al. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(8): 1861-1873. doi: 10.1109/TVLSI.2019.2905242
    [13] NAKAHARA H, YONEKAWA H, FUJII T, et al. A lightweight yolov2: A binarized CNN with a parallel support vector regression for an FPGA[C]//Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. USA: ACM, 2018: 31-40.
    [14] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2016: 770-778.
    [15] 赵澜涛, 林家骏. 基于双路CNN的多姿态人脸识别方法[J]. 华东理工大学学报(自然科学版), 2019, 45(3): 466-470.
    [16] LARKIN D, KINANE A, O’CONNOR N. Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices[C]//International Conference on Neural Information Processing. Berlin, Heidelberg: Springer, 2006: 1178-1188.
    [17] FARABET C, LECUN Y, KAVUKCUOGLU K, et al. Large-scale FPGA-based convolutional networks[J]. Scaling up Machine Learning: Parallel and Distributed Approaches, 2011, 13(3): 399-419.
    [18] CHEN T, DU Z, SUN N, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning[J]. ACM Sigarch Computer Architecture News, 2014, 42(1): 269-284. doi: 10.1145/2654822.2541967
    [19] CHEN Y, LUO T, LIU S, et al. Dadiannao: A machine-learning supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. UK: IEEE, 2014: 609-622.
    [20] 朱雯文, 叶西宁. 基于卷积神经网络的手势识别算法[J]. 华东理工大学学报(自然科学版), 2018, 44(2): 260-269.
    [21] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning. Brookline: JMLR, 2015: 448-456.
    [22] ZHAO R Z, NIU X Y, WU Y J, et al. Optimizing CNN-based object detection algorithms on embedded FPGA platforms[C]//Proceedings of the 13th International Symposium on Applied Reconfigurable Computing. Berlin, Heidelberg: Springer, 2017: 255-267.
    [23] WAI Y J, BIN MOHD Y USSof Z, BIN SALIM S I, et al. Fixed point implementation of Tiny- Yolo- v2 using OpenCL on FPGA[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(10): 506-512.
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  652
  • HTML全文浏览量:  456
  • PDF下载量:  25
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-10-27
  • 网络出版日期:  2021-01-19

目录

    /

    返回文章
    返回