AudioLabs - Lecture: Advanced Speech Processing, Winter Term 2025/2026

Lecture: Advanced Speech Processing, Winter Term 2025/2026

Instructor: Prof. Dr. Emanuël Habets
Teaching Assistant: TBD
Time: Winter Term 2025/2026, Tuesday's 14:15-15:45
Place: Am Wolfsmantel 33, Erlangen-Tennenlohe, Room 3R4.04
Format: Lecture
Credits: 2,5 ECTS
Exam (graded): Oral examination at the end of the term

News

NEW The lecture on 13.01.2026 will be from 13:30 until 16:45.
NEW There will be no lecture on 16.12.2025.
The first lecture will be held on October 28th at 14.15.

Format

The lecture has the following format:

Every meeting consists of 90 minutes

For further information, please contact Prof. Dr. Emanuël Habets.

Content

Speech is at the core of human communication and increasingly central to our interaction with technology. From voice assistants and teleconferencing to hearing aids, security applications, and immersive media, speech technologies must perform robustly in real-world acoustic environments. These environments are often far from ideal: noise, reverberation, and interfering sources can severely degrade the quality and intelligibility of speech signals. At the same time, advances in machine learning and signal processing have opened new opportunities for creating, modifying, and analyzing speech in powerful ways.

This lecture provides a comprehensive introduction to advanced speech processing, covering both classical and modern neural approaches. Topics include:

Speech quality and intelligibility assessment: objective and subjective methods for evaluating speech processing algorithms.
Speech enhancement: noise reduction and dereverberation with classical signal processing and deep learning.
Speech extraction and separation: isolating target speakers or signals from complex mixtures.
Beamforming: spatial filtering to enhance speech captured by microphone arrays using classical array processing and deep learning.
Speaker identification and verification: modeling and recognizing speaker characteristics for personalization and security.
Text-to-speech synthesis (TTS): generating natural and expressive speech from text with modern neural architectures.
Voice anonymization: transforming speech signals to protect privacy while preserving intelligibility.
Self-supervised and foundation speech models: Learning robust speech representations from unlabeled data, and leveraging large-scale pre-trained models (e.g., wav2vec 2.0, HuBERT, Whisper) for representation learning and transfer to downstream speech tasks.

The lecture combines theoretical foundations, algorithmic insights, and practical demonstrations. Students will gain an understanding of both classical methods and cutting-edge neural approaches, and their application in real-world scenarios.

Target audience: This lecture is designed for graduate students and researchers interested in speech and audio technology. By the end of the lecture, participants will have a strong foundation to understand, design, and critically evaluate advanced speech processing methods.

Complementary courses:

Speech and Language Understanding by Prof. Dr.-Ing. Andreas Maier, which includes automatic speech recognition.
Generative Models for Signal Processing by Dr.-Ing. Andreas Brendel and Dr. Nicola Pia, which includes neural speech coding.

Course Material

The lecture slides can be downloaded on StudOn.

International Audio Laboratories Erlangen

Lecture: Advanced Speech Processing, Winter Term 2025/2026

News

Format

Content

Course Material

Links