TY - GEN
T1 - 32-parallel SAD tree hardwired engine for variable block size motion estimation in HDTV1080p real-time encoding application
AU - Liu, Zhenyu
AU - Song, Yang
AU - Shao, Ming
AU - Li, Shen
AU - Li, Ngfeng
AU - Goto, Satoshi
AU - Ikenaga, Takeshi
PY - 2007/12/1
Y1 - 2007/12/1
N2 - H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 ×128 search range for HDTV1080p@30Hz is demonstrated in this paper. With TSMC 0.18μm 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.
AB - H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 ×128 search range for HDTV1080p@30Hz is demonstrated in this paper. With TSMC 0.18μm 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.
KW - H.264/AVC
KW - HDTV1080p
KW - Integer motion estimation
KW - VLSI
KW - Variable block size
UR - http://www.scopus.com/inward/record.url?scp=47949087862&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47949087862&partnerID=8YFLogxK
U2 - 10.1109/SIPS.2007.4387630
DO - 10.1109/SIPS.2007.4387630
M3 - Conference contribution
AN - SCOPUS:47949087862
SN - 1424412226
SN - 9781424412228
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 675
EP - 680
BT - 2007 IEEE Workshop on Signal Processing Systems, SiPS 2007, Proceedings
T2 - 2007 IEEE Workshop on Signal Processing Systems, SiPS 2007
Y2 - 17 October 2007 through 19 October 2007
ER -