TY - JOUR
T1 - A 48 cycles/MB H.264/AVC deblocking filter architecture for ultra high definition applications
AU - Zhou, Dajiang
AU - Zhou, Jinjia
AU - Zhu, Jiayi
AU - Goto, Satoshi
PY - 2009/12
Y1 - 2009/12
N2 - In this paper, a highly parallel deblocking filter architecture for H.264/AVC is proposed to process one macroblock in 48 clock cycles and give real-time support to QFHD@60 fps sequences at less than 100 MHz. 4 edge filters organized in 2 groups for simultaneously processing vertical and horizontal edges are applied in this architecture to enhance its throughput. While parallelism increases, pipeline hazards arise owing to the latency of edge filters and data dependency of deblocking algorithm. To solve this problem, a zig-zag processing schedule is proposed to eliminate the pipeline bubbles. Data path of the architecture is then derived according to the processing schedule and optimized through data flow merging, so as to minimize the cost of logic and internal buff er. Meanwhile, the architecture's data input rate is designed to be identical to its throughput, while the transmission order of input data can also match the zig-zag processing schedule. Therefore no intercommunication buffer is required between the deblocking filter and its previous component for speed matching or data reordering. As a result, only one 24x64 two-port SRAM as internal buffer is required in this design. When synthesized with SMIC 130 nm process, the architecture costs a gate count of 30.2 k, which is competitive considering its high performance.
AB - In this paper, a highly parallel deblocking filter architecture for H.264/AVC is proposed to process one macroblock in 48 clock cycles and give real-time support to QFHD@60 fps sequences at less than 100 MHz. 4 edge filters organized in 2 groups for simultaneously processing vertical and horizontal edges are applied in this architecture to enhance its throughput. While parallelism increases, pipeline hazards arise owing to the latency of edge filters and data dependency of deblocking algorithm. To solve this problem, a zig-zag processing schedule is proposed to eliminate the pipeline bubbles. Data path of the architecture is then derived according to the processing schedule and optimized through data flow merging, so as to minimize the cost of logic and internal buff er. Meanwhile, the architecture's data input rate is designed to be identical to its throughput, while the transmission order of input data can also match the zig-zag processing schedule. Therefore no intercommunication buffer is required between the deblocking filter and its previous component for speed matching or data reordering. As a result, only one 24x64 two-port SRAM as internal buffer is required in this design. When synthesized with SMIC 130 nm process, the architecture costs a gate count of 30.2 k, which is competitive considering its high performance.
KW - Deblocking
KW - H.264/AVC
KW - Parallel
KW - QFHD
KW - Ultra high resolution
UR - http://www.scopus.com/inward/record.url?scp=84857572417&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857572417&partnerID=8YFLogxK
U2 - 10.1587/transfun.E92.A.3203
DO - 10.1587/transfun.E92.A.3203
M3 - Article
AN - SCOPUS:84857572417
SN - 0916-8508
VL - E92-A
SP - 3203
EP - 3210
JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
IS - 12
ER -