Jpeg Analyzer Documentation


The purpose of the jpeg analyzer program is to decode a jpeg image into rgb pixels, and then to analyze those pixels. At this time, the program only decodes the image into pixels, and no analysis is done.


The program is called as follows:

JpegAnalyzer [filename] [# processes] [# threads]

There's no output at this point, because no processing is being done. Were I to implement processing functionality, I'd do one of three things, depending:

1. Put the processing inside of PixelData.populatePixels(), but this isn't a good OO solution.

2. Put the processing in the JpegAnalyzer class, and call it after PixelData.pixels is populated. This approach makes sense if we want to run JpegAnalyzer by itself. The program can then return or display any data that we want.

3. Have JpegAnalyzer return the pixel array, then write a program that creates an instance of JpegAnalyzer. The program would run JpegAnalyzer, then do whatever it wants with the pixels. This is probably the most robust option, although it may be appropriate to rename JpegAnalyzer to JpegReader or something.

All parallel MASS calls occur in the PixelData class, when the IDCT is run on the MCUs.



How the Decoding Process Works


The decoding process can be described in three steps: reading the meta-data, reading the scan, and converting the scan data. Before describing these steps, a bit of information about how jpegs work is necesary.


Jpeg format background


Jpegs consist of segments of information, identified by markers. These segments consist of either scan data, or meta data. The meta data tells us how to read the scan. The markers tell us about how to read the following segment, and each one has a unique format.


Jpegs are created as follows:


1. Convert rgb to YCbCr.


The rgb values of the input image (or any other color band, for that matter) are converted to YCbCr. Y represents luminance (light/dark values), while Cb and Cr represent the blue and red chrominance values. Green isn’t explicitly stored, but can be calculated.


This is done so that more Y information can be stored than Cb and Cr. The human eye sees light and dark values much more accurately than color, so jpegs save space by leaving out some color information. This is called chroma subsampling.


Chroma subsampling is done by averaging chrominance values together. Ratios are generally 2:1 or 2:2. These indicate the dimensions of the area that the chrominance is averaged over. So if you were using 2:1, you’d store each Y value, the average of each horizontal pair of Cb pixels, and the average of each horizontal pair of Cr pixels.


2. Encode data
The YCbCr values are grouped into blocks called MCUs, which stands for Minimum Coded Unit. The size of an MCUs depends on the chroma subsampling ratio. Each dimension is 8 times the ratio in that direction. So 1:1 sampling would yield 8x8 pixels, 2:1 yield 16x8, 2:2 yields 16x16 and so forth.


The values are then coded using a function called the Discrete Cosine Transform. The DCT stores waves by converting them to a set of coefficients. When multiplied by a set of pre-defined cosine waves of varying frequency, the coefficients yield the original wave.


3. Calculate and store DC values
The upper-left corner of each MCU is called the DC value. DC stands for Direct Current, because the DCT was first used to encode electrical waves. It is meaningless in the context of jpegs. The DC is the most significant value in the DCT, because it stores the lowest frequency wave. Consequently, it tends to vary the most between MCUs, and is stored a bit differently.


The DC is stored using Discrete Pulse Code Modulation, or DPCM. This means that it’s value is stored as the difference between itself and the last value. So, if the first 3 DC values in the first three MCUs were:
MCU0: 20
MCU1: 30
MCU2: -65


Their actual values are as follows. Note that the first MCU’s previous value is defined to be 0.
MCU0: 20
MCU1: 50
MCU2: -15


The rest of the values are stored as absolute values, and are called AC values. As with DC, AC stands for Alternating Current, and is meaningless in the context of jpegs.


4. Quantize MCUs
MCUs are then quantized using quantization tables. The tables are defined during the encoding process, and will vary from image to image. The tables are 8x8, each value corresponding to a coordinate in the MCUs. To quantize, divide the data value by the quantization value, and round to the nearest integer.


The quantization is done so that the jpeg can ignore high-frequency information. Quantization tables typically have higher values for the higher frequency data so that it can be eliminated. The eye can’t see it anyway.


Lastly, there is usually one quantization table for the Y values, and one for both the Cb and Cr values.


5. Create huffman tables
The quantized data is then used to create huffman codes, which will store the data. There are four different huffman tables in a jpeg, each specified by ID and class. They are used as follows:


ID: 0 Class: 0 Y DC
ID: 1 Class: 0 Y AC
ID: 0 Class: 1 CbCr DC
ID: 1 Class: 1 CbCr AC


6. Entropy code the data
The quantized data is then stored in an entropy coded zig-zag that uses run-length encoding to group 0s together. Values are stored in the following order:


Y DC
Y AC
Cb DC
Cb AD
Cr DC
Cr AC


The values can be thought of as pairs of data. In each value, there is first a number which codes the number of zeros before the next non-zero value (this is the run-length encoding mentioned earlier), and the number of bits which store the next non zero value. An example is the easiest way to explain this.


1. Say we’ve decoded the value 0x2A
2. Divide by 0x10 (which is to say 16, because we’re working in hex), to get 2, which is the number of 0s.
3. Mod by 0x10, to get A, which is the number of bits of the next non-zero value.
4. Read the next 10 bits.
5. Decode those bits as follows:
n = number of bits
x = value
if x > 2^(n-1) then the value is x
if x < 2^(n-1) then the value is -(2^n - x - 1)
So if the next ten bits were: 0000000011, we’d have: -(2^10 - 3 - 1) = -1020


There are two notable values that can be encountered in the coded data. The first is the end-of-block (EOB) marker, which decodes to 0. This indicates that there are only zeros left in that band, so just write the rest out as 0s and move on to the next band.


The other interesting value is F0 which indicates a run of 16 zeros. Notice that when modding by 16, we’re left with zero, which indicates that the next value’s length is 0, so we just write 16 0s and move on to the next RLE value.


That should be enough information to get a basic idea of how the program works. There are some links at the end of this document which provide additional information.




Decoding a Jpeg


1. Read meta-data


To read a jpeg, we need to have huffman tables, quantization tables, and scan data. These can be stored in any order in the jpeg file, but all of the jpegs I’ve encountered store the tables before the scan data.


So we parse the file, looking at markers and decoding the meta-data appropriately. The program stores quantization table and huffman table information in corresponding data structures.


2. Read the scan data


With the necessary meta-data stored, the program can begin reading the scan data. It starts reading Y DC values, as per the storage ordering specified in #6 in the previous section. It reads bit by bit, checking to see if that code and length correspond to a huffman code. Note that the huffman table’s getValue() method requires a code and a length. This is because huffman codes can have the same literal value, but different lengths, so the length parameter is necessary to avoid collisions.


The program writes values into Mcu objects for the standard version, or McuDataBlock objects for the parallel version (more on that later). When it encounters an EOB, it moves on to the next data category.


At the end of a scan, there may be a run of 1’s. This is done because markers have to be stored on byte offsets. So, at the end of a scan, there may be up to 7 1s which need to be discarded.


3. Converting the scan data        


At this point, we have an array of Mcus or McuDataBlocks full of data that need to be run through the Inverse DCT. In the regular version of the program, each MCU is converted one by one. In the parallelized version, an array of Mcus are created, each of which extend Place. These are then populated by calling the populate function with callAll. The data is passed to populate with an array of McuDataBlock. The parallelized Mcu objects read the data out of the McuDataBlock array by looking up their own index with the Place index[0] member variable, and then access that array index.


For the IDCT, each MCU needs to know its chroma subsampling ratio, because it needs to return the correct size of pixel array. The algorithm is written such that the MCU will calculate each Y value necessary, then one Cb and one Cr value. So, for 2:2 subsampling, the program calculates four Y values, then one Cb and one Cr, since the Cb and Cr values are averaged over a 2x2 pixel area.


The Pixel object converts the YCbCr data to rgb in its constructor.