TY - JOUR
T1 - High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding
AU - He, Gang
AU - Zhou, Dajiang
AU - Li, Yunsong
AU - Chen, Zhixiang
AU - Zhang, Tianruo
AU - Goto, Satoshi
PY - 2015/3/19
Y1 - 2015/3/19
N2 - Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.
AB - Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.
UR - http://www.scopus.com/inward/record.url?scp=84925435248&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84925435248&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2014.2386897
DO - 10.1109/TVLSI.2014.2386897
M3 - Article
AN - SCOPUS:84959461430
SN - 1063-8210
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ER -