Speech to Speech Synthesis for Voice Impersonation

• Introduces Speech to Speech Synthesis Network (STSSN) for voice impersonation via style transfer. • Combines state‑of‑the‑art speech recognition and synthesis models into a unified architecture. • Demonstrates realistic audio generation despite limited capacity, outperforming comparable GAN-based approaches. • Benchmarks against generative adversarial model, showing higher fidelity and convincing impersonation. • Provides experimental PDF and HTML access, encouraging reproducibility and community collaboration. • Highlights potential applications in entertainment, accessibility, and security, while noting ethical considerations.

Article Summaries:

Computer Science > Sound [Submitted on 13 Feb 2026] Title:Speech to Speech Synthesis for Voice Impersonation View PDF HTML (experimental)Abstract:Numerous models have shown great success in the fields of speech recognition as well as speech synthesis, but models for speech to speech processing have not been heavily explored. We propose Speech to Speech Synthesis Network (STSSN), a model based on current state of the art systems that fuses the two disciplines in order to perform effective speech to speech style transfer for the purpose of voice impersonation. We show that our proposed model is

Sources:

https://arxiv.org/abs/2602.16721