Zero-shot Voice Conversion: Comparing HierSpeech++ to Other Basemodels | HackerNoon
Table of Links Abstract and 1 Introduction 2 Related Work 2.1 Neural Codec Language Models and 2.2 Non-autoregressive Models 2.3 Diffusion Models and 2.4 Zero-shot Voice Cloning 3 Hierspeech++ and 3.1 Speech Representations 3.2 Hierarchical Speech Synthesizer 3.3 Text-to-Vec 3.4 Speech Super-resolution 3.5 Model Architecture 4 Speech Synthesis Tasks 4.1 Voice Conversion and 4.2 Text-to-Speech 4.3 Style Prompt Replication 5 Experiment and Result, and Dataset 5.2 Preprocessing and 5.3 Training 5.4 Evaluation Metrics 5.5 Ablation