While there isn't a single famous paper with that exact title, several research papers specifically address the challenges of generating accurate for time-compressed (sped up) audio, often using techniques like SOLAFS or modern AI alignment. Key Research Papers
: A 2025 paper that introduces a data-driven approach using the Canary model. It uses a <|timestamp|> token to predict start and end times for words with high precision (80–90%), even as audio characteristics change.
: This research focuses on predicting timestamps directly within an end-to-end speech recognition system, ensuring that word duration and placement remain accurate during processing. Common Technical Approaches
WhisperX: Automatic Speech Recognition with Word ... - GitHub