Convolutional operators in the time-frequency domain. Applications to audio classification and te

Seminar: 
Applied Mathematics/Analysis Seminar
Event time: 
Monday, November 13, 2017 - 3:50pm to 5:00pm
Location: 
LOM 206
Speaker: 
Vincent Lostanlen
Speaker affiliation: 
NYU
Event description: 

Audio classification is the problem of automatically retrieving the source of a sound according to a predefined taxonomy. In this talk, I will address the design of signal representations which satisfy appropriate invariants while preserving inter-class variability. First, I will present time-frequency scattering, a deep wavelet-based operator which extracts modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. Secondly, I will introduce spiral scattering, an improvement over time-frequency scattering which follows the geometry of the musical spiral of pitches, making one full turn at every octave. Unlike time-frequency scattering, spiral scattering disentangles and linearizes variations in both pitch and spectral envelope through time. I give applications of these methods to the audio classification of urban sounds and musical instruments, and draw a comparison with deep convolutional neural networks in the time-frequency domain. In addition, these representations can be used as summary statistics for audio texture synthesis. Unlike purely temporal methods, time-frequency scattering and spiral scattering are able to capture the coherence of spectrotemporal patterns, such as those arising in bioacoustics or speech, up to a scale of about 500 ms. Based on this analysis-synthesis framework, I have undertaken a collaboration with composer Florian Hecker, which has led to the creation of five computer music pieces.