A 45nm 37.3GOPS/W heterogeneous multi-core SoC

Yoichi Yuyama*, Masayuki Ito, Yoshikazu Kiyoshige, Yusuke Nitta, Shigezumi Matsui, Osamu Nishii, Atsushi Hasegawa, Makoto Ishikawa, Tetsuya Yamada, Junichi Miyakoshi, Koichi Terada, Tohru Nojiri, Makoto Satoh, Hiroyuki Mizuno, Kunio Uchiyama, Yasutaka Wada, Keiji Kimura, Hironori Kasahara, Hideo Maejima

*この研究の対応する著者

研究成果: Conference contribution

33 被引用数 (Scopus)

抄録

We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

本文言語English
ホスト出版物のタイトル2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - Digest of Technical Papers
ページ100-101
ページ数2
DOI
出版ステータスPublished - 2010 5月 18
イベント2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - San Francisco, CA, United States
継続期間: 2010 2月 72010 2月 11

出版物シリーズ

名前Digest of Technical Papers - IEEE International Solid-State Circuits Conference
53
ISSN(印刷版)0193-6530

Conference

Conference2010 IEEE International Solid-State Circuits Conference, ISSCC 2010
国/地域United States
CitySan Francisco, CA
Period10/2/710/2/11

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 電子工学および電気工学

フィンガープリント

「A 45nm 37.3GOPS/W heterogeneous multi-core SoC」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル