📝 Publications
🎙 Machine Hearing and Vision

Coal Gangue Recognition in the Strong Background Noise using Two-level Auditory Feature Fusion with Attention Mechanism
Yang Z, Wang SB, Yang SG, Liu SY, Zhang ZP, Liu HG*
- This work is included by many famous speech synthesis open-source projects, such as PaddlePaddle/Parakeet
, ESPNet
and fairseq
.

Intelligent Coal Gangue Identification: A Novel Amplitude Frequency Sensitive Neural Network \ Zhang ZP, Zhu ZC, Meng B, Yang Z, Wu MK, Cheng XY, Li BH, Liu HG*.
- This work has been deployed on many TikTok products.
- Advandced zero-shot voice cloning model.

CFENet: A Contrastive Frequency-sensitive Learning Method for Gas-insulated Switch-gear Fault Detection under Varying Operating Conditions using Acoustic Signals
Zhang ZP, Liu HG*, Shao YY, et al.
- Many video demos created by the DiffSinger community are released.
-
DiffSinger was introduced in a very popular video (1600k+ views) on Bilibili!
- Project |
|
|

Hierarchical Spiking Neural Network Auditory Feature Based Dry-type Transformer Fault Diagnosis using Convolutional Neural Network
Zhao HY, Yang Y, Liu HG*, et al.
Project | |
AAAI 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling, Rui Liu, Yifan Hu, Yi Ren, et al.ICML 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models, Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, et al.ACL 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training, Zhenhui Ye, Rongjie Huang, Yi Ren, et al.ACL 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models, Ziyue Jiang, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren and Zhou ZhaoACL 2023
Revisiting and Incorporating GAN and Diffusion Models in High-Fidelity Speech Synthesis, Rongjie Huang, Yi Ren, Ziyue Jiang, et al.ACL 2023
Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech, Rongjie Huang, Chunlei Zhang, Yi Ren, et al.ICLR 2023
Bag of Tricks for Unsupervised Text-to-Speech, Yi Ren, Chen Zhang, Shuicheng YanINTERSPEECH 2023
StyleS2ST: zero-shot style transfer for direct speech-to-speech translation, Kun Song, Yi Ren, Yi Lei, et al.INTERSPEECH 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech, Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, et al.NeurIPS 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech, Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, et al.NeurIPS 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech, Rongjie Huang, Yi Ren, et al.NeurIPS 2022
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus, Lichao Zhang, Ruiqi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, et al. (Datasets and Benchmarks Track)ACM-MM 2022
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech, Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, Chenye Cui, Yi Ren,ACM-MM 2022
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation, Rongjie Huang, Chenye Cui, Chen Feiayng, Yi Ren, et al.IJCAI 2022
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech, Zhenhui Ye, Zhou Zhao, Yi Ren, et al.IJCAI 2022
(Oral) EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling, Lichao Zhang, Zhou Zhao, Yi Ren, et al.IJCAI 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis, Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao, (Oral),NAACL 2022
A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation, Kexun Zhang, Rui Wang, Xu Tan, Junliang Guo, Yi Ren, et al.ACL 2022
Revisiting Over-Smoothness in Text to Speech, Yi Ren, Xu Tan, Tao Qin, et al.ACL 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier, Jinglin Liu, Chengxi Li, Yi Ren, et al. |ICASSP 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech, Yi Ren, et al.INTERSPEECH 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model, Chenye Cui, Yi Ren, et al.INTERSPEECH 2021
(best student paper award candidate) WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution, Kexun Zhang, Yi Ren, Changliang Xu and Zhou ZhaoICASSP 2021
Denoising Text to Speech with Frame-Level Noise Modeling, Chen Zhang, Yi Ren, Xu Tan, et al. | ProjectACM-MM 2021
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus, Rongjie Huang, Feiyang Chen, Yi Ren, et al. (Oral)IJCAI 2021
FedSpeech: Federated Text-to-Speech with Continual Learning, Ziyue Jiang, Yi Ren, et al.KDD 2020
DeepSinger: Singing Voice Synthesis with Data Mined From the Web, Yi Ren, Xu Tan, Tao Qin, et al. | ProjectKDD 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition, Jin Xu, Xu Tan, Yi Ren, et al. | ProjectINTERSPEECH 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer, Mingjian Chen, Xu Tan, Yi Ren, et al. | ProjectICML 2019
(Oral) Almost Unsupervised Text to Speech and Automatic Speech Recognition, Yi Ren, Xu Tan, Tao Qin, et al. | Project
🎼 Active Middle-ear Implant and Speech Enhancement


Modeling and Analysis of Ear Dynamics with a Round-window Stimulating Active Middle Ear Implant, Kou YX, Wang J, Liu ZH, Liu SY, Guo WW, Chen W, Liu HG*.

Effect of Electromagnetic Transducer Design Parameters on Round-window Stimulation in Otosclerosis: a Nonlinear Dynamic Analysis, Liu HG, Liu ZH, Liu JS, Thomas Lenarz, Hannes Maier.

Loudness Model For Round Window Stimulation Based On Human Ear Physiology, Liu ZH, Liu HG*, Thomas Lenarz, Hannes Maier.
Eur J Mech A-solid 2025
Nonlinear Electromechanical Analysis of Middle Ear Motion and Stability Induced by the Vibrant Soundbridge Coupled to the Stapes Head, Kou YX, Liu HG*, Guo WW, et al.Nonlinear Dyn 2025
Nonlinear dynamic response and stability of the stapes driven by a floating mass type piezoelectric transducer with a nonlinear coupler, Kou YX, Liu HG*, Guo WW, et al.Comput Speech Lang 2025
LRetUNet: A U-Net-based retentive network for single-channel speech enhancement, Zhang YX, Zhang ZP, Guo WW, Chen W, Liu ZH, Liu HG, et al.CDBME 2024
Finite element modeling of a piezoelectric actuator coupled to the cochlear round window, Liu HG*, Kou YX, Wang J, Thomas Lenarz, Hannes Maier.INT J NUMER METH BIO 2024
Effect of electromagnetic middle-ear implant simulating sites on the stapes spatial motion: A finite element analysis, Zhang YX, Liu HG*, Zhou L, et al.
📚 NVH of Vehicle
MSSP 2025
A Comprehensive Sound Quality Evaluation Method For Periodic And Transient Nonlinear Noise Utilizing An Optimized Wavelet Scattering Network, Xu MQ, Zhang S, Liu HG, et al.ICLR 2023
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation, Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao仪器仪表学报 2024
Sound quality prediction of vehicle interior noise based on physiological structure of human ear 基于人耳生理结构的车内噪声声品质预测, Liu ZH, Zhang B, He ZH, Zhao Y, Liu HG*IJCAI 2020
Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation, Jinglin Liu, Yi Ren, Xu Tan, et al.ACL 2020
SimulSpeech: End-to-End Simultaneous Speech to Text Translation, Yi Ren, Jinglin Liu, Xu Tan, et al.ACL 2020
A Study of Non-autoregressive Model for Sequence Generation, Yi Ren, Jinglin Liu, Xu Tan, et al.ICLR 2019
Multilingual Neural Machine Translation with Knowledge Distillation, Xu Tan, Yi Ren, Di He, et al.
🧑🎨 Machine Dynamics and Fault Diagnosis
IEEE TMM
SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation, Chen Zhang, Yi Ren, Kejun Zhang, Shuicheng Yan.AAAI 2021
SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, et al.ACM-MM 2020
(Oral) PopMAG: Pop Music Accompaniment Generation, Yi Ren, Jinzheng He, Xu Tan, et al. | Project
Others
IEEE TASE 2025
Modeling of Asynchronous Mode-Dependent Delays in Stochastic Markovian Jumping Modes Based on Static Neural Networks for Robotic Manipulators, Shamrooz S, Aslam MS, Liu HG, et al.ACM-MM 2022
Video-Guided Curriculum Learning for Spoken Video Grounding, Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren