HCI/CprE/ComS 575: Homework #4

List of Audio Libraries and References

The following list of C / C++ libraries and references for audio processing is provided for your convenience only. You are not required to use any of these libraries / references for this homework.

Fastest Fourier Transform in the West library.
The Fourier Transform and its Applications: A Video Course.
Keep It Simple, Stupid FFT library.
Audacity. Useful for analyzing raw audio (waves) and modifying audio files (ie. stereo <--> mono).

Part 1a: Slow DFT

Implement the Discrete Fourier Transform (DFT) in C, C++, MATLAB, Java, or Python. Implement the slow version that multiplies the transform matrix by the input vector in O(N²) time. Your code should support input vectors of size up to 2048.

Source

// Insert your code here

Part 1b: Slow IDFT

Implement the Inverse Discrete Fourier Transform (IDFT) in C, C++, MATLAB, Java, or Python. Implement the slow version that multiplies the transform matrix by the input vector in O(N²) time. Your code should support input vectors of size up to 2048.

Source

// Insert your code here

Part 2a: FFT

Implement the Fast Fourier Transform (FFT) in C, C++, MATLAB, Java, or Python. Implement the fast version that uses recursion and runs in O(n log₂ n) time. Note that you are not allowed to use MATLAB's implementation nor any other existing library for this problem. Your code should support input vectors of size up to 2048. Use your code from Part 1a to cross-check your implementation.

Source

// Insert your code here

Part 2b: IFFT

Implement the Inverse Fast Fourier Transform (IFFT) in C, C++, MATLAB, Java, or Python. Implement the fast version that uses recursion and runs in O(n log₂ n) time. Note that you are not allowed to use MATLAB's implementation nor any other existing library for this problem. Your code should support input vectors of size up to 2048. Use your code from Part 1b to cross-check your implementation.

Source

// Insert your code here

Part 3a: FFT check

Using your implementation from Part 2a, compute the Discrete Fourier Transform of each of the following two vectors 'x':

Vector 'x'

[0, 0.7071, 1, 0.7071, 0, -0.7071, -1, -0.7071]

Note: you may want to use sqrt(2)/2 instead of 0.7071.

Result

// Insert your resulting vector here

Vector 'x'

[-0.0144+0.0709i, -0.1586-0.0502i, 0.5764, -0.1586-0.0502i, -0.4631-0.0709i, -0.1586-0.0502i, 0.5764, -0.1586-0.0502i]

Result

       // Insert your resulting vector here

Compare your output with the output generated by MATLAB's fft() method for the same two vectors 'x'. Include the result below, and any discrepancies. You may also use one of the FFT libraries above, if you choose. Is there anything special about the second vector 'x' in this problem? What happens if you divide each element of the resulting vector by the corresponding element of x?

Matlab Result & Analysis

// Insert your two new resulting vectors here

// Insert any comments here

Part 3b: IFFT check

Using your implementation from Part 2b, compute the inverse Discrete Fourier Transform of the following vector:

Vector 'X'

[0, -4i, 0, 0, 0, 0, 0, 4i]

Result

// Insert your resulting vector here

Vector 'X'

[0.0409-0.2008i, 0.4487+0.1418i, -1.6303+0.0000i, 0.4487+0.1418i, 1.3097+0.2008i, 0.4487+0.1418i, -1.6303+0.0000i, 0.4487+0.1418i]

Result

     // Insert your resulting vector here

Compare your output with the output generated by MATLAB's ifft() method for the same two vectors 'X'. Include the result below, and any discrepancies. You may also use one of the IFFT libraries above, if you choose.

Result & Analysis

// Insert your two new resulting vectors here

// Insert any comments here

Part 4

Using your FFT implementation from Part 2, compute and plot the spectrograms for the following 3 audio files (see also Part 5 below). Reminder: the spectrogram is computed by: 1) splitting the input vector into small overlapping segments (e.g., 0.25s with an overlap of 0.1s), 2) applying the window function (e.g., the Hamming window) to each segment, 3) computing the FFTs of the resulting vectors, 4) discarding the redundant frequency bins for the negative frequencies from these vectors (because the input is real), 5) computing the logarithms of the magnitudes of the frequency bins.

The result is a 2D matrix of logarithms where one dimension corresponds to time and another corresponds to frequency. This matrix is the spectrogram, which you can visualize as an image where the colors of rectangular patches are determined by the matrix elements. For better visualization, you may want to discard negative values of logarithms and set them to zero.

Audio Data	Spectrogram

Source

// Insert your code here

Part 5

Repeat Part 4, but now use a standard spectrogram function in MATLAB or any other visualization package instead of your implementation.

Audio Data	Spectrogram

Source

// Insert your code here.

// Are your spectrograms from Part 4 different or similar from Part 5? If they are different, then why?

Part 6

Count the number of coins collected in the Super Mario game using only the audio and insert a running counter of coins into the output video. The suggested approach is to generate the audio spectrogram from the soundtrack. The coin collection generates a distinct spectral signature that your code can detect in the spectgrogram. Its location indicates the time for incrementing the coin counter. The coin counter generated by your program should agree with the built-in coin counter in the game, although there can be minor differences in the timing of increments. You can use any spectrogram implementation.

Input Video and Audio	Output Video

Source

// Insert your code here

Extra Credit

Repeat Part 6 for the following two videos, that also include the in-game music.

Input Video and Audio	Output Video

Source

    // Insert your code here

HCI/CprE/ComS 575: Homework #4

Part 1a: Slow DFT

Part 1b: Slow IDFT

Part 2a: FFT

Part 2b: IFFT

Part 3a: FFT check

`[0, 0.7071, 1, 0.7071, 0, -0.7071, -1, -0.7071]`

`[-0.0144+0.0709i, -0.1586-0.0502i, 0.5764, -0.1586-0.0502i, -0.4631-0.0709i, -0.1586-0.0502i, 0.5764, -0.1586-0.0502i]`

Part 3b: IFFT check

`[0, -4i, 0, 0, 0, 0, 0, 4i]`

`[0.0409-0.2008i, 0.4487+0.1418i, -1.6303+0.0000i, 0.4487+0.1418i, 1.3097+0.2008i, 0.4487+0.1418i, -1.6303+0.0000i, 0.4487+0.1418i]`

Part 4

Part 5

Part 6

Extra Credit