Zengyi Qin

qinzy [at] alum.mit.edu

Researcher and entrepreneur (company public listed).

I am the primary author of several widely recognized AI models. One has ~30k stars and was trending 1st on Github. It ranked top 0.03% of all open-source projects created during the similar period, based on popularity. Another project also trended 1st on GitHub, receives 19.2M downloads, making it one of the most widely adopted AI models since 2024.

Have solid experience pre-training and post-training LLMs. Project lead of JetMoE-8B, an MoA+MoE LLM pre-trained and post-trained from scratch (not finetuning existing models), under extreme limitations in compute and data, with less than 0.1M USD cost, but outperforms LLaMA2-7B.

Co-founded an AI Agent platform MyShell for everyone to orchestrate hundreds of AI models to build and use agentic applications. The platform now has > 5 million users and >200,000 AI agents are built. The users have had > 1 billion interactions with the AI agents. These make it one of the most successful agent platforms in the field.

Education

Massachusetts Institute of Technology, 2020-2025, PhD

Stanford University, 2019, Visiting Researcher in Stanford Vision and Learning Lab

Tsinghua University, 2016-2020, BE, Electronic Engineering

News

[Feb 2025] MyShell is publicly listed on major crypto exchanges. Huge congrats to the team! This is a really hard but rewarding journey because almost nobody believed AI + Decentralization make sense when we started MyShell two years ago. But now it already becomes a consensus.

Binance

PANews

Twitter

[Apr 2024] I lead the development of JetMoE-8B, an MoA+MoE LLM pre-trained and post-trained from scratch, under extreme limitations in compute and data, with less than 0.1M USD cost, but outperforms LLaMA2-7B. It democratized high-performance LLM pre-training and post-training, and received strong positive feedback from the field.

Technical blog

MIT CSAIL posts

The breakthrough represented by JetMoE-8B signals a significant democratization of AI technology

[Jan 2024] I lead the development of OpenVoice, an audio foundation model that allows users to clone any voice and generate speech in various styles and languages.

Technical blog

1st on Github

~30k stars

top 0.03%

Serving >3M users

VentureBeat

HyScaler

AI Voice Cloning Redefined: OpenVoice Unveils Revolutionary Open-Source Technology

Projects in Generative Models

	JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars Yikang Shen, Zhen Guo, Tianle Cai and Zengyi Qin Technical Report, 2024 JetMoE is pre-trained and post-trained from scratch with less than 0.1M USD cost but outperforms LLaMA2-7B. It democratized high-performance LLM pre-training and post-training with remarkable cost-efficiency. website \| github \| tech report
	OpenVoice: Versatile Instant Voice Cloning Zengyi Qin, Wenliang Zhao, Xumin Yu and Xin Sun Technical Report, 2024 Instantly clone any voice to generate speech in various styles and languages. paper \| website \| source code Star
	MeloTTS: A high-quality multi-lingual multi-accent text-to-speech library Wenliang Zhao, Xumin Yu, Zengyi Qin High-quality multi-lingual text-to-speech library that supports English (US, BR, AU, INDIAN), Spanish, French, Chinese, Japanese and Korean source code Star 19.2M downloads. One of the most widely adopted AI models since 2024
	DreamVoice: Text-Guided Voice Conversion Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali Interspeech, 2024 Convert a voice into any voice based on the input text prompt. paper \| website \| source code

Projects in 3D Computer Vision

	MonoGRNet: A General Framework for Monocular 3D Object Detection Zengyi Qin, Jinglu Wang and Yan Lu The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 A general monocular 3D object detection framework that flexibly adapts to both fully and weakly supervised learning, which alleviates the need of extensive 3D labels and only requires ground truth 2D bounding boxes during training. paper
	Weakly Supervised 3D Object Detection from Point Clouds Zengyi Qin, Jinglu Wang and Yan Lu ACM Multimedia (ACM MM), 2020 A state-of-the-art framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training. The core of our method is the unsupervised 3D object proposal module and the cross-modal knowledge distillation strategy. paper \| code
	Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Zengyi Qin, Jinglu Wang and Yan Lu The International Conference on Computer Vision and Pattern Recognition (CVPR), 2019 This is a pioneering work on stereo image based 3D object detection without calculating the pixel-level depth maps. We proposed a triangulation learning method to learn the object-level stereo geometric correspondence for 3D object detection. paper \| video \| code \| website
	MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization Zengyi Qin, Jinglu Wang and Yan Lu The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019 Oral Presentation, Acceptance Rate < 8% A state-of-the-art monocular 3D object detection approach based on geometric reasoning. We proposed to decompose the whole task into four progressive sub-tasks that significantly facilitates the monocular 3D object detection. paper \| video \| code \| website

Projects in Reinforcement Learning and Robotics

	SABLAS: Learning Safe Control for Black-Box Dynamical Systems Zengyi Qin, Dawei Sun and Chuchu Fan IEEE Robotics and Automation Letters (RA-L), 2022 Learning control barrier functions (CBFs) for safe control of black-box systems. CBFs are a powerful tool to provide safety guarantee, but before this work, they cannot be directly applied to black box systems where their models are unavailable. paper \| code
	KETO: Learning Keypoint Representations for Tool Manipulation Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei and Silvio Savarese The International Conference on Robotics and Automation (ICRA), 2020 KETO is a framework for robots to manipulate unseen objects as tools to complete diverse tasks. We proposed a method to learn the keypoint representations of objects, which simplify the manipulation task and improve the generality to novel objects. paper \| video \| website \| code
	Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen and Chuchu Fan The International Conference on Learning Representations (ICLR), 2021 We study the multi-agent safe control problem where agents should avoid any collision while reaching their goals. Our method can scale up to an arbitrarily large number of agents (e.g., >1000 in our experiments) and achieve a 99-100% safety rate. paper \| video \| code \| website
	Reactive and Safe Road User Simulations using Neural Barrier Certificate Yue Meng, Zengyi Qin and Chuchu Fan The International Conference on Intelligent Robots and Systems (IROS), 2021 Reactive and safe agent modelings are important for nowadays traffic simulator designs and safe planning applications. We propose a control barrier function-based method to simulate traffic agents that behave like humans or human controlled vehicles, which react to other road participants. paper \| website
	Density Constrained Reinforcement Learning Zengyi Qin, Yuxiao Chen and Chuchu Fan The International Conference on Machine Learning (ICML), 2021 We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous work. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. paper \| code \| website
	Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions Charles Dawson, Zengyi Qin, Sicun Gao and Chuchu Fan The Conference on Robot Learning (CoRL), 2021 Safety and stability are common requirements for robotic control systems. We propose a robust feedback method based on robust control Lyapunov barrier functions that generalize despite model uncertainty, and with safety and stability guarantee. paper
	Controller synthesis for linear system with reach-avoid specifications Chuchu Fan, Zengyi Qin, Umang Mathur, Qiang Ning, Sayan Mitra, and Mahesh Viswanathan IEEE Transactions on Automatic Control (TAC), 2021 We address the problem of synthesizing provably correct controllers for linear systems with reach-avoid specifications. Our solution decomposes the overall synthesis problem into two smaller and more tractable problems, achieving a 2-150 times speedup compared with the previous techniques. paper

Projects in AI for Healthcare

Learning fine-grained estimation of physiological states from coarse-grained labels by distribution restoration
Zengyi Qin, Jiansheng Chen, Zhenyu Jiang, Xumin Yu, Chunhua Hu, Yu Ma, Suhua Miao and Rongsong Zhou, Scientific Reports, 2020

Our method allows machine learning algorithms to perform fine-grained estimation of physiological states (e.g., sleep depth) even if the training labels are coarse-grained.

paper | code

sEMG based Tremor Severity Evaluation for Parkinson's Disease using a Light-weight CNN
Zengyi Qin*, Zhenyu Jiang*, Jiansheng Chen, Chunhua Hu and Yu Ma
IEEE Signal Processing Letters (SPL), 2019

A machine learning framework to assist the diagnosis of Parkinson's Disease by assessing the pathological tremor. We proposed a light-weight convolutional neural network and a similarity learning strategy to handle the scarcity of medical data.

paper | website

	JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars Yikang Shen, Zhen Guo, Tianle Cai and Zengyi Qin Technical Report, 2024 JetMoE is pre-trained and post-trained from scratch with less than 0.1M USD cost but outperforms LLaMA2-7B. It democratized high-performance LLM pre-training and post-training with remarkable cost-efficiency. website \| github \| tech report
	OpenVoice: Versatile Instant Voice Cloning Zengyi Qin, Wenliang Zhao, Xumin Yu and Xin Sun Technical Report, 2024 Instantly clone any voice to generate speech in various styles and languages. paper \| website \| source code Star
	MeloTTS: A high-quality multi-lingual multi-accent text-to-speech library Wenliang Zhao, Xumin Yu, Zengyi Qin High-quality multi-lingual text-to-speech library that supports English (US, BR, AU, INDIAN), Spanish, French, Chinese, Japanese and Korean source code Star 19.2M downloads. One of the most widely adopted AI models since 2024
	DreamVoice: Text-Guided Voice Conversion Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali Interspeech, 2024 Convert a voice into any voice based on the input text prompt. paper \| website \| source code

	MonoGRNet: A General Framework for Monocular 3D Object Detection Zengyi Qin, Jinglu Wang and Yan Lu The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 A general monocular 3D object detection framework that flexibly adapts to both fully and weakly supervised learning, which alleviates the need of extensive 3D labels and only requires ground truth 2D bounding boxes during training. paper
	Weakly Supervised 3D Object Detection from Point Clouds Zengyi Qin, Jinglu Wang and Yan Lu ACM Multimedia (ACM MM), 2020 A state-of-the-art framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training. The core of our method is the unsupervised 3D object proposal module and the cross-modal knowledge distillation strategy. paper \| code
	Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Zengyi Qin, Jinglu Wang and Yan Lu The International Conference on Computer Vision and Pattern Recognition (CVPR), 2019 This is a pioneering work on stereo image based 3D object detection without calculating the pixel-level depth maps. We proposed a triangulation learning method to learn the object-level stereo geometric correspondence for 3D object detection. paper \| video \| code \| website
	MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization Zengyi Qin, Jinglu Wang and Yan Lu The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019 Oral Presentation, Acceptance Rate < 8% A state-of-the-art monocular 3D object detection approach based on geometric reasoning. We proposed to decompose the whole task into four progressive sub-tasks that significantly facilitates the monocular 3D object detection. paper \| video \| code \| website

	SABLAS: Learning Safe Control for Black-Box Dynamical Systems Zengyi Qin, Dawei Sun and Chuchu Fan IEEE Robotics and Automation Letters (RA-L), 2022 Learning control barrier functions (CBFs) for safe control of black-box systems. CBFs are a powerful tool to provide safety guarantee, but before this work, they cannot be directly applied to black box systems where their models are unavailable. paper \| code
	KETO: Learning Keypoint Representations for Tool Manipulation Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei and Silvio Savarese The International Conference on Robotics and Automation (ICRA), 2020 KETO is a framework for robots to manipulate unseen objects as tools to complete diverse tasks. We proposed a method to learn the keypoint representations of objects, which simplify the manipulation task and improve the generality to novel objects. paper \| video \| website \| code
	Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen and Chuchu Fan The International Conference on Learning Representations (ICLR), 2021 We study the multi-agent safe control problem where agents should avoid any collision while reaching their goals. Our method can scale up to an arbitrarily large number of agents (e.g., >1000 in our experiments) and achieve a 99-100% safety rate. paper \| video \| code \| website
	Reactive and Safe Road User Simulations using Neural Barrier Certificate Yue Meng, Zengyi Qin and Chuchu Fan The International Conference on Intelligent Robots and Systems (IROS), 2021 Reactive and safe agent modelings are important for nowadays traffic simulator designs and safe planning applications. We propose a control barrier function-based method to simulate traffic agents that behave like humans or human controlled vehicles, which react to other road participants. paper \| website
	Density Constrained Reinforcement Learning Zengyi Qin, Yuxiao Chen and Chuchu Fan The International Conference on Machine Learning (ICML), 2021 We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous work. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. paper \| code \| website
	Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions Charles Dawson, Zengyi Qin, Sicun Gao and Chuchu Fan The Conference on Robot Learning (CoRL), 2021 Safety and stability are common requirements for robotic control systems. We propose a robust feedback method based on robust control Lyapunov barrier functions that generalize despite model uncertainty, and with safety and stability guarantee. paper
	Controller synthesis for linear system with reach-avoid specifications Chuchu Fan, Zengyi Qin, Umang Mathur, Qiang Ning, Sayan Mitra, and Mahesh Viswanathan IEEE Transactions on Automatic Control (TAC), 2021 We address the problem of synthesizing provably correct controllers for linear systems with reach-avoid specifications. Our solution decomposes the overall synthesis problem into two smaller and more tractable problems, achieving a 2-150 times speedup compared with the previous techniques. paper