Video Coding Overview

In the past, we have looked at color spaces in parts 1 and 2, and image compression here. In this post, I will provide an introduction to video coding, and hopefully get into more detail in the later posts.

Consider a video at a resolution of 640×480@30fps (frames per sec). Thus for each second, there are 30 frames (which are images) each at a resolution of 640×480. Representing each of these images as (R, G, B) data would require 640 * 480 * 3 bytes per frame, assuming 1 byte per primary color. Thus, a second of video would take up 640 * 480 * 3 * 30 bytes; that’s about 100GB of data per hour. The goal then is to compress the video into a smaller size.

One straight-forward approach would be to compress each of the frames using JPEG techniques described earlier (this is called the MJPEG codec). JPEG typically gets a compression factor of 10; that would still be about 10GB of video data per hour. Using video coding techniques, we should be able to get to 2GB of data per hour.

Note that MJPEG’s poor performance is due to lack of interframe prediction (Let’s say the video content is a static image repeated over and over. MJPEG encodes each frame independently, and does not make use of the fact that the frames are all the same; it does not make use of inter-frame information). Video codecs like MPEG1, MPEG2, MPEG4, H.264 (which is MPEG4 Part 10) all make use of inter-frame prediction.

In MPEG1-4 (excluding H.264), a frame is the fundamental codec element. Each frame is encoded as one of follows:

  • I-frame (all the macroblocks are I-blocks, and do not refer to frames in the past)
  • P-frame (the macroblocks in this frame are either I-blocks or P-blocks). P here stands for Predicted frames, and P-blocks make reference to I-frames in the past. I will go into the details of inter-frame prediction in a later post.
  • B-frame (the macroblocks in this frame are I-blocks, P-blocks, or B-blocks). B here stands for Bi-predicted frames, and B-blocks make reference to I- and P- frames in the past and in the future.  

H.264 is the most advanced of the afore-mentioned codecs. It typically provides 1.5 times better compression than MPEG4 Part 2, provides techniques for better error resilience over lossy networks, better parallelization of decoding, in-loop deblocking to smooth rough edges, amongst several other improvements. From now on, I will present most of the video coding techniques in the context of H.264.

In H264, a slice is the fundamental codec element (as opposed to the frame in MPEG1-4); an entire frame can be treated as one slice (one slice per frame), or a frame can be split into multiple slices (multi-slicing). Each slice consists of a set of macroblocks, and can be decoded independently of other slices. Multi-slicing provides error resilience over lossy networks, and also aids parallelization in decoding a frame. I will get to different slice allocations like Flexible Macroblock Ordering (FMO), Arbitrary Slice Ordering (ASO), and Redundant Slices (RS) in a later post.

Each slice, then, consists of a sequence of macroblocks. There are 5 types of slices possible:

  • I-slice consists only of I-blocks (which can be decoded without referring to other frames)
  • P-slices consist of I- and P- blocks
  • B slices consist of I-, P- and B- blocks
  • SI slices 
  • SP slices 

SI and SP slices consist of a special kind of I- and P-blocks respectively to facilitate switching between different bitrates of the same stream. We will look at SI and SP slices in a later post.

In the next series of posts, I will take a deeper look at various aspects of H.264 in the following order:

  • Overview of the H.264 codec pipeline 
  • Intra-frame prediction schemes
  • Inter-frame prediction schemes (will include a discussion of motion vectors)
  • In-loop deblocking filter
  • Slice allocation schemes
  • SI and SP slices
  • Entropy coding techniques (Variable Length Coding (VLC) and Binary Arithmetic Coding (BAC))
  • Network Abstraction Layer (NAL) units

References:
— Iain Richardson’s excellent articles on H.264 here.
H.264 primer
— http://en.wikipedia.org/wiki/Motion_JPEG
— http://en.wikipedia.org/wiki/MPEG-1
— http://tech.ebu.ch/docs/techreview/trev_293-schaefer.pdf
— Good discussion of NAL Units and slices here.
— Nice paper on the H.264 standard.
— H.264 and MPEG-4 video compression by Iain Richardson (Pg. 164 for slices etc.)
— http://ip.hhi.de/imagecom_G1/assets/pdfs/csvt_overview_0305.pdf
— Overview of NAL
— http://en.wikipedia.org/wiki/MPEG-2
— http://en.wikipedia.org/wiki/MPEG-3
— http://en.wikipedia.org/wiki/MPEG-4
— http://www.kanecomputing.co.uk/pdfs/compression_ratio_rules_of_thumb.pdf

About annapureddy

Sidd is the VP of Engineering at Dyyno Inc. He received his Ph.D. and M.S. degrees from NYU, and Bachelor's from IIT Madras, all in Computer Science. http://www.scs.stanford.edu/~reddy/
This entry was posted in ASO, B-block, FMO, frames, h264, I-block, macroblock, motion vector, mpeg1, mpeg2, mpeg4, mpeg4 avc, mpeg4 part 10, NAL units, P-block, RS, slices, video coding. Bookmark the permalink.

Leave a comment