A 98 GMACs/W 32-core vector processor in 65 nm CMOS

Xun He*, Xin Jin, Minghui Wang, Dajiang Zhou, Satoshi Goto

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    6 Citations (Scopus)


    This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BWin average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm 2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.

    Original languageEnglish
    Pages (from-to)2609-2618
    Number of pages10
    JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    Issue number12
    Publication statusPublished - 2011 Dec


    • Cache coherence
    • GMACs
    • Multicore processor
    • NoC
    • SIMD

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Computer Graphics and Computer-Aided Design
    • Applied Mathematics
    • Signal Processing


    Dive into the research topics of 'A 98 GMACs/W 32-core vector processor in 65 nm CMOS'. Together they form a unique fingerprint.

    Cite this