A 45nm 37.3GOPS/W heterogeneous multi-core SoC

Yoichi Yuyama*, Masayuki Ito, Yoshikazu Kiyoshige, Yusuke Nitta, Shigezumi Matsui, Osamu Nishii, Atsushi Hasegawa, Makoto Ishikawa, Tetsuya Yamada, Junichi Miyakoshi, Koichi Terada, Tohru Nojiri, Makoto Satoh, Hiroyuki Mizuno, Kunio Uchiyama, Yasutaka Wada, Keiji Kimura, Hironori Kasahara, Hideo Maejima

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Citations (Scopus)

Abstract

We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

Original languageEnglish
Title of host publication2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - Digest of Technical Papers
Pages100-101
Number of pages2
DOIs
Publication statusPublished - 2010
Event2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - San Francisco, CA, United States
Duration: 2010 Feb 72010 Feb 11

Publication series

NameDigest of Technical Papers - IEEE International Solid-State Circuits Conference
Volume53
ISSN (Print)0193-6530

Conference

Conference2010 IEEE International Solid-State Circuits Conference, ISSCC 2010
Country/TerritoryUnited States
CitySan Francisco, CA
Period10/2/710/2/11

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A 45nm 37.3GOPS/W heterogeneous multi-core SoC'. Together they form a unique fingerprint.

Cite this