Background Digital audio in the form of compact discs started a revolution in car audio, and in consumer audio in general. Never before had music of such a high sound quality been affordable as well as portable. With the media attention on MPEG Layer 3 (or MP3) audio and its popularity, one is tempted to look at this format as a similar revolution. By now, alas, most consumers realize that the MP3 codec - "codec" stands for "compressor/decompressor" and refers to any technology designed to compress and decompress digital data - compromises the sound quality of the original recording and is therefore inferior to CD audio. Indeed, the goal of MP3 is to radically reduce the size of the audio file at the expense of sound quality.
The key word with MP3 compression, though, is "compromise." Naturally, the audiophile with the super-duper, high fidelity mobile multimedia system would never consider using MP3 as software: that would be blasphemous! What about the guy who just wants woofers and 10,000 watts to wake the neighbors; or the commuter with a simple CD/speaker upgrade to the factory system in their daily driver? Would MP3-encoded audio, either played on an MP3 player or ripped onto CDs from a computer, be an acceptable format for their vehicles given their preferences in music software? We wanted to address this question in a short series of articles about MP3.
This first article is meant to be a brief description of CD audio and MP3 technology. Understanding the technology behind this codec allows the consumer to know what is being excluded from their music when using this format. The first part of this article covers principles of converting an analog waveform into digital data. The second part discusses psychoacoustic concepts used to compress audio data in codecs like MP3. The third part is a description of the MP3 encoder itself, with features specific to the codec. Finally, we look forward to future articles where we subject MP3 data to real-world listening tests.
Analog-to-Digital ConversionnAnalog recording consists of converting sound waves into electromagnetic ones, represented as fluctuations in voltage, which in turn are written onto a storage medium such as analog tapes. For digital audio, the analog signal is sampled to create a stream of numbers representing the sound wave.
Consider converting a 1 kHz sine wave into a set of data (Figure 1). To create the data stream, we take a "measurement" of the wave amplitude at fixed time intervals, referred to as the sampling frequency. The standard frequency interval of CD audio is 44.1 kHz, but for this example we'll use a sampling rate of 8 kHz. Assume the amplitude ranges from -15 to +15 volts, with the value at zero for time zero. For one cycle of our 1 kHz sine wave, we get the following values for the first few data points:
The accuracy of these values depends upon the measuring instrument, but this data is to be stored digitally and must be represented in terms of combinations of zeros and ones.
Since we're dealing with binary data, here's a quick blurb about binary numbers for the uninitiated. A single number is called a bit and can represent two numbers, a "0" for zero or "1" for one. Two bits can represent four numbers: "00" as zero in decimal notation, "01" as one, "10" as two, and "11" as three. Three bits can account for eight numbers: the four in the two bit case plus "100" as four, "101" as five, "110" as six, and "111" as seven. Sixteen bits - the standard used for compact disc audio - can represent 216 or 65,536 numbers. Note that, the more bits used to represent the data, the more accurate the digital representation of the analog waveform.