TY - JOUR
T1 - A low-cost VLSI architecture of multiple-Size IDCT for H.265/HEVC
AU - Sun, Heming
AU - Zhou, Dajiang
AU - Liu, Peilin
AU - Goto, Satoshi
N1 - Publisher Copyright:
Copyright © 2014 The Institute of Electronics, Information and Communication Engineers.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of 1D IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K × 2K 60 fps video sequence.
AB - In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of 1D IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K × 2K 60 fps video sequence.
KW - Area-efficient
KW - HEVC
KW - IDCT
KW - SRAM
KW - Video coding
UR - http://www.scopus.com/inward/record.url?scp=84924551001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84924551001&partnerID=8YFLogxK
U2 - 10.1587/transfun.E97.A.2467
DO - 10.1587/transfun.E97.A.2467
M3 - Article
AN - SCOPUS:84924551001
SN - 0916-8508
VL - E97A
SP - 2467
EP - 2476
JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
IS - 12
ER -