RGB, YUV, and Color Spaces

This post will be the first in what will hopefully be a series on video coding. A raw video is a series of frames at a particular resolution. For example, a frame at 640×480 resolution will have 307200 pixels. Each pixel can be represented by a three tuple (R, G, B) corresponding to each of the primary colors. A typical scheme is to use 1 byte (256 values) for each color, so a pixel can be represented with 3 bytes1. Thus, a 640×480 resolution video at 30fps (frames per sec) would result in 640 * 480 * 3 * 30 bytes of data. Several encoding techniques are used to capture the elements important for human perception (anything not important for human perception could be discarded to reduce the amount of data for broadcasting to TV, recording to DVD etc.) This is where YUV, YCbCr etc. come into play, and will be discussed below.

Now for some historical issues. The TVs initially were only black and white, and hence only the gray values were transmitted in the broadcast signal (analog at the time). Note that gray values are nothing but equal parts of the (R, G, B) colors. Thus, black is (0, 0, 0), white is (255, 255, 255), and then there are 254 gray values in between. This 1-byte representation of gray values is called luminance, and the symbol Y or Y’ is typically used to represent it (we will use Y in this post2). Note that ignoring color and using just the Y values results in 3 times less data than the original.

Then, the color TVs burst onto the scene. One needed a TV signal that would work on both the older black and white TVs, and also the newer color TVs. The color TVs are also designed to use the same circuitry as the older black and white TVs for the luminance part (Y), and had additional circuitry for rendering colors. The color information is called chrominance, and has two components U and V. The U is computed as the difference between the blue color and Y, and V is computed as the difference between the red color and Y (roughly speaking, the exact calculations will be given in a later post). This U and V information is then added to the main Y signal using a sub-carrier (A sub-carrier is a signal that is grafted onto a main signal, and carries additional data, in this case the color information). The color TV would then receive the YUV signal, compute the original R, G, B colors, and then render it on to the TV. Note that at this point, we are back to 3 bytes per pixel again.

Given that the bandwidth of the TV signal is limited, there is a need to reduce the amount of bits per pixel. Also, the quality of picture on the initial color TVs was constrained more by monitor technology than the bandwidth of the TV signal, so fidelity to the original signal was not an emphasis. Given that the human eye is typically more sensitive to luminance than chrominance, the 1 byte of luminance per pixel is not sacrificed any further. The chrominance components are sub-sampled to reduce the amount of bits per pixel (to reduce the bandwidth of the TV signal). This is done in two ways3. First, only the chrominance components for alternate scan lines are retained (this already cuts down the amount of UV data by half). Second, for every group of 4 pixels, only 2 UV values are retained instead of 4 UV values (this cuts down the UV data by half). With these two techniques, the amount of UV data that gets transmitted is one-fourth the original amount.

Alpha: A fourth byte is sometimes added to represent the alpha value of a pixel. The alpha value indicates the transparency of a pixel. This is useful, for example, to overlay transparent logos on top of video.
Luminance/Luma: Luminance is the relative intensity of a pixel with respect to a reference white color. It’s a theoretical term, and is typically represented using the symbol Y. In video engineering, because of the non-linearity of the intensity of the electron gun with respect to the applied voltage in the original cathode ray tube, gamma encoding of the RGB values into R’G’B’ values is typically done (to nullify the non-linearity). The weighted sum of the R’G’B’ values is called luma, and is typically represented by the symbol Y’.
Chroma Subsampling: The encoding described is called YUV420. There are other encodings like 411 (UV data cut to one-fourth), 422 (UV data cut to one-half) etc. For a region which is J horizontal pixels wide and 2 pixels high, ‘a’ represents the number of U and number of V components in the first row, and ‘b’ represents the number of U and number of V components in the second row. Thus, J = 4, a = 2, b = 0 represents the YUV420 encoding.

References:
— http://en.wikipedia.org/wiki/YUV
— http://en.wikipedia.org/wiki/Chroma_subsampling
— http://en.wikipedia.org/wiki/Luminance_(colorimetry)
— http://en.wikipedia.org/wiki/Luma_(video)
— http://en.wikipedia.org/wiki/Chrominance
— http://en.wikipedia.org/wiki/Hue
— http://en.wikipedia.org/wiki/Saturation_(color_theory)
— http://en.wikipedia.org/wiki/Subcarrier
— http://en.wikipedia.org/wiki/Gamma_correction

About annapureddy

Sidd is the VP of Engineering at Dyyno Inc. He received his Ph.D. and M.S. degrees from NYU, and Bachelor's from IIT Madras, all in Computer Science. http://www.scs.stanford.edu/~reddy/
This entry was posted in absolute color space, chroma, chroma sub-sampling, chrominance, color space, encoding, gamma correction, hue, luminance, RGB, saturation, video coding, video compression, YCbCr, YUV, YUV420. Bookmark the permalink.

Leave a comment