DV vs MJPEG compression 

The following discussion by Guy Bonneau was developed from a posting on DVL in August of 1998.
Guy Bonneau is a physicist engineer and earned a M.Sc.A. in Bioengineering from Montreal Polytechnique School in 1981. Since then he has been mainly involved in realtime software and hardware design. Currently he is working for a large multimedia company as a technical leader and video codec expert.
Guy lives in Montreal, Québec and can be reached at "gbonneau at mailcity dot com" (convert " at " to "@" and " dot " to "." to get the real eddress).
There has been a lot of discussion recently about DV vs MJPEG compression/decompression artifacts. I would like to point out some key issues that are involved with losses of information in either method. Each issue brings its own considerations when analyzing what cause the losses. I will discuss the key points from a compression point of view. Unfortunately, I will have to be somewhat technical with MJPEG jargon.
1 Most software DV codecs decompress/compress data originating in an RGB color space. However, the DV algorithm uses data in the YUV 4:1:1 (525/59.94) or 4:2:0 (625/50) color spaces. The process of converting between RGB and YUV spaces is a lossy process. Two reasons are involved in this loss of information.
First, the RGBtoYUV transcoding matrix introduces small losses due to the finite nature of digital data using 8 bits of precision (the finite precision limits the accuracy, and rounding errors will occur).
Second, the UV component of the YUV data must be downsampled from 4:4:4 sampling to 4:1:1 or 4:2:0 sampling following RGBYUV conversion. This can be done by decimation or through digital filtering, but in any case this introduces a further loss. This can lead to odd "color edging" artifacts and to "color spreading" or smear in multipass compression/decompression. The magnitude of spreading depends on the design of the digital filter. To really see the effect of this chroma filter or decimation I suggest that the same picture be compressed in multipass with and without the chroma information (i.e., compress both a color and a blackandwhite version of the same picture).
2 The DV standard uses DCT (Discrete Cosine Transform) to compress
the pixel data. Again, this is a lossy process because of the finite
precision
of the digital data. Note that many digital algorithms exist to compute
the DCT. Some DCT algorithms emphasize precision at the expense of the
speed of computation. The opposite
is true also, for quicker if less accurate rendering.
The "Blue Book" specification regulates the precision of the DCT algorithm to maximize the DCT accuracy. However, it is quite possible to compromise the DCT computation when implementing a software codec to give a flavor of speed at the expense of precision. Not all DV codecs necessarily follow the Blue Book to the letter.
Besides, the DV standard requests that the video stream be compressed on a frame basis, while most MJPEG codecs compress the video stream on a field basis. Because of the interlaced nature of a frame, the two merged fields may contain quite different information if the scene has any motion, and this gives rise to a problem when the pixel data are processed by DCT. In this case, the standard DCT algorithm used with MJPEG will perform badly because of the mixing of different field information. The standard DCT will produce a lot of inflated AC coefficients (related to the spurious highfrequency vertical detail caused by the differing interleaved fields) which are very difficult to compress efficiently even with high quantization values.
To overcome this problem the DV standard make a provision for a new
DCT "mode" which is fine tuned for this situation and doesn't produce
inflated
AC coefficients. Rather than process the pixel data on a 8 x 8 block basis
with the standard DCT, the new DCT mode deinterlaces the pixel data in
two independent 4 X 8 blocks and
computes two DCTs on the 4 X 8 pixel blocks, still producing a total
of 64 DC and AC coefficients. The DV jargon defines this new DCT mode the
"248DCT mode" versus the standard "88DCT mode".
The DV standard specifies how to tell the decompressor which DCT mode is used. However, the algorithms that choose the DCT modes during compression are not defined by the Blue Book specification, and are proprietary to the companies developing them. Most algorithms use a motion estimation technique to determine which DCT mode to use. It is also quite possible to use a brute force technique: computing both DCT modes and using the one that gives the less inflated AC coefficients. Discriminating between the 2 DCT modes really gives a software codec a hard time, and is in some part responsible for the longer time it takes to compress a frame than to decompress it. A good DCT mode algorithm should choose the appropriate DCT mode most of the time. This could provide an overall gain in compression space up to 10% or higher, and thus provide a better overall artifactfree picture.
3 Once the DCT is computed, the Blue book specifies that the DCT's DC and AC coefficients must be weighted against a welldefined complex mathematical relationship. This is in fact a "prequantization" process that is mandatory in DV.
Note that this weighting process doesn't exist in MJPEG. It is in fact not quite a quantization step, since the DC/AC coefficients are weighted with a floating point value, not an integer value as in MJPEG. This weighting process, like quantization, make some AC coefficients drop to 0 when rounding to integer values. Thus this is also a lossy process.
4 Up to this point, the Blue Book specification frames everything.
From this point on the data must be further compressed using the same
process
as MJPEG: quantization of the AC coefficients and entropy coding.
However, the quantization processing is more sophisticated in DV than
in MJPEG.
In MJPEG the quantization factor is chosen with a unique value for the whole frame. (Worse, most MJPEG algorithms compute this value against the preceding frame's data, which may not give an optimum quantization value for the current frame.) In DV, the DCT data of the whole frame are subdivided in 270 video segments (in 525/59.94 video). Each video segment is further subdivided into 5 areas called "macroblocks". The DV specification allows every macroblock to have its own quantization value. This means that a DV frame has 1350 quantization values which can be defined, vs. the 1 quantization value for an MJPEG frame. This is why DV is a lot better than MJPEG for the same data rate; DV allows fine tuning of individual parts of the frame.
Now, the DV algorithm doesn't define the quantization value of a video
segment that must be used when a frame is compressed; this is left to the
codec implementation. In order to have a fixed compressed data rate
of 25 Mbits/Sec, the compressed video segment, once quantized and encoded
with the "Huffman Entropy Algorithm", must be constrained inside a maximum
of 2560 bits of data space, and this is where every codec is different.
How to choose the quantization factor so that, once Huffman entropy encoded, the data will fit the 2560 bits of data space? Depending on the image complexity, if you choose too low a quantization factor the codec will overrun the data space and some AC coefficients won't get stored, thus giving rise to excessive artifacts when decoded. If you choose too high a quantization factor the codec will underrun the data space, dropping AC coefficients which could have been coded and stored inside the allocated data space, thus also giving rise to artifacts. When you choose the quantization factor properly, the coded data will include as many of the AC coefficients as possible and still fit into the allocated space, minimizing artifacts.
Moreover, if the algorithm can detect that among the 5 areas of a video segment some contain details which are more likely to be sensitive to DCT artifacts, then quantization may be fine tuned among the 5 areas to give more quantization to less sensitive areas and use the data space saved to better encode the sensitive material.
The algorithms that choose these quantization factors are not defined by the Blue Book specification and are proprietary to the companies who are developing them (this is also true for MJPEG algorithms that choose a quantization value from frame to frame). A good DV algorithm will maximize the use of the 2560 bits and will be able to discriminate among the 5 areas of a video segment as to which ones are most sensitive to DCT artifacts. It is quite possible that a very bad algorithm will use only 50% of the 2560 bits of allocated space, thus wasting 50% of the space, and yet be fully DV compliant. Most software and hardware DV codec on the market today make use of over 70% of the allocated space; the better ones make use of over 80% of the allocated space (however, the raw percentage of used space alone can be misleading, depending on how well the algorithm can detect sensitive areas within a video segment).
The algorithms which choose the quantization value are quite a complex matter. There are many issues involved that can be based on probability, statistic, heuristics, and imageprocessing mathematics. Because quantization is a lossy process, the decoding and reencoding of the same frame in a multipass testing may or may not stabilize depending of the DV algorithm used.
Points 1 and point 4 are those that most affect the picture quality
and losses in multipass generation testing.
Home  SW Engineering  Film & Video Production  Video Tidbits  >DV< 
Last updated 15 August 1998.