I am currently a member of the Next-gen Kaldi team at Xiaomi Corp., working under the supervision of Dr. Daniel Povey.

I received my Ph.D. degree from the Institute of Acoustics, Chinese Academy of Sciences (IOACAS) and the University of Chinese Academy of Sciences (UCAS) in 2024, and my bachelor’s degree from the University of Electronic Science and Technology of China (UESTC) in 2019.

My research interests primarily focus on text-to-speech (TTS) and automatic speech recognition (ASR). My work has been published in top speech and AI venues, including IEEE/ACM TASLP, INTERSPEECH, ICASSP, ASRU, ACL, and ICLR. I built open-source TTS projects OmniVoice and ZipVoice .

🔥 News

2026.04: 🎉 We released OmniVoice, a SOTA voice-cloning TTS model supporting 600+ languages.
2026.04: 🎉 ZipVoice-Dialog is accepted by ACL 2026 (Findings).
2026.01: 🎉 Flow2GAN is accepted by ICLR 2026.
2025.08: 🎉 ZipVoice is accepted by ASRU 2025.
2025.01: 🎉 CR-CTC is accepted by ICLR 2025.
2024.07: 😄 I joined the Next-gen Kaldi team at Xiaomi Corp.
2024.06: 🎓 I received my Ph.D. degree from the Institute of Acoustics, Chinese Academy of Sciences (IOACAS) and the University of Chinese Academy of Sciences (UCAS).

📝 Selected Publications

Full list on Google Scholar.

🔊 Text-to-Speech (TTS)

[Preprint 2026] OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
H Zhu, L Ye, W Kang, Z Yao, L Guo, F Kuang, Z Han, W Zhuang, L Lin, D Povey
Paper Code Demo
[ASRU 2025] ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
H Zhu, W Kang, Z Yao, L Guo, F Kuang, Z Li, W Zhuang, L Lin, D Povey
Paper Code Demo
[ACL 2026 Findings] ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching
H Zhu, W Kang, L Guo, Z Yao, F Kuang, W Zhuang, Z Li, Z Han, D Zhang, X Zhang, X Song, L Ye, L Lin, D Povey
Paper Code Demo Dataset
[ICLR 2026] Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
Z Yao, W Kang, H Zhu, L Guo, L Ye, F Kuang, W Zhuang, Z Li, Z Han, L Lin, D Povey
Paper Code Demo

🎙️ Automatic Speech Recognition (ASR)

[IEEE/ACM TASLP 2024] Boosting Cross-Domain Speech Recognition with Self-Supervision
H Zhu, G Cheng, J Wang, W Hou, P Zhang, Y Yan
Paper
[IEEE/ACM TASLP 2023] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
H Zhu, D Gao, G Cheng, D Povey, P Zhang, Y Yan
Paper
[INTERSPEECH 2022] Decoupled Federated Learning for ASR with Non-IID Data
H Zhu, J Wang, G Cheng, P Zhang, Y Yan
Paper
[INTERSPEECH 2022] Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
H Zhu, L Wang, J Wang, G Cheng, P Zhang, Y Yan
Paper
[INTERSPEECH 2020] Domain Adaptation Using Class Similarity for Robust Speech Recognition
H Zhu, J Zhao, Y Ren, L Wang, P Zhang
Paper
[INTERSPEECH 2019] Multi-Accent Adaptation Based on Gate Mechanism
H Zhu, L Wang, P Zhang, Y Yan
Paper
[ICLR 2025] CR-CTC: Consistency Regularization on CTC for Improved Speech Recognition
Z Yao, W Kang, X Yang, F Kuang, L Guo, H Zhu, Z Jin, Z Li, L Lin, D Povey
Paper Code
[IEEE/ACM TASLP 2026] Pseudo-Labeling Based Unsupervised Domain Adaptation for LLM-Based ASR
L Zheng, H Zhu, X Wang, X Li, T Li, Y Yan
Paper
[ICASSP 2025] Hybrid Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
L Zheng, H Zhu, C Yang, X Wang, G Cheng, T Li
Paper
[IEEE SPL 2024] Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition
L Zheng, H Zhu, S Tian, Q Zhao, T Li
Paper
[IEEE/ACM TASLP 2022] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
W Hou, H Zhu, Y Wang, J Wang, T Qin, R Xu, T Shinozaki
Paper
[ICASSP 2021] Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data
C Gao, G Cheng, R Yang, H Zhu, P Zhang, Y Yan
Paper

💻 Open Source Projects

OmniVoice

A SOTA voice-cloning TTS model supporting 600+ languages, powered by a novel diffusion language model-style architecture.

ZipVoice⚡️

ZipVoice

A series of fast and high-quality zero-shot text-to-speech models built with the flow matching objective and the Zipformer backbone, including ZipVoice, ZipVoice-Distill and ZipVoice-Dialog.

📖 Educations

Ph.D., Institute of Acoustics, Chinese Academy of Sciences (IOACAS), and University of Chinese Academy of Sciences (UCAS)
2019.09 – 2024.06
B.Eng., University of Electronic Science and Technology of China (UESTC)
2015.09 – 2019.06

💼 Work Experiences

Next-gen Kaldi Team, Xiaomi Corp.
Supervised by Dr. Daniel Povey
2024.07 – Present