Paralinguistic vocalizations—including non-verbal sounds like laughter and breathing, as well as lexicalized interjections such as “uhm” and “oh”—are integral to natural spoken communication. Despite their importance in conveying affect, intent, and interactional cues, such cues remain largely overlooked in conventional automatic speech recognition (ASR) and text-to-speech (TTS) systems. We present NVSpeech, an integrated and scalable pipeline that bridges the recognition and synthesis of paralinguistic vocalizations, encompassing dataset construction, ASR modeling, and controllable TTS. (1) We introduce a manually annotated dataset of 48,430 human-spoken utterances with 18 word-level paralinguistic categories. (2) We develop the paralinguistic-aware ASR model, which treats paralinguistic cues as inline decodable tokens (e.g., “You’re so funny [Laughter]”), enabling joint lexical and non-verbal transcription. This model is then used to automatically annotate a large corpus, the first large-scale Chinese dataset of 174,179 utterances (573 hours) with word-level alignment and paralingustic cues. (3) We finetune zero-shot TTS models on both human- and auto-labeled data to enable explicit control over paralinguistic vocalizations, allowing context-aware insertion at arbitrary token positions for human-like speech synthesis. By unifying the recognition and generation of paralinguistic vocalizations, NVSpeech offers the first open, large-scale, word-level annotated pipeline for expressive speech modeling in Mandarin, integrating recognition and synthesis in a scalable and controllable manner.
Paralinguistic Tags | CosyVoice Demo | CosyVoice2 Demo | Ints Demo | Chinese Target Text | Translations |
---|---|---|---|---|---|
Non-verbal vocalizations
Prosodic/Attitudinal
cues
Discourse-like markers
|
|||||
[Breathing]
|
还需要[Breathing]…调整。 | Still need [Breathing]... adjustment | |||
[Confirmation-en]
[Breathing]
|
[Confirmation-en],[Breathing]你说的也很有可能,但不论如何,我相信所有事情都应该有一个确切的「真相」,[Breathing]而「真相」总会被揭开。 | [Confirmation-en], [Breathing] What you said could very well be true, but in any case, I believe there must be an exact “truth” to everything, [Breathing] and the “truth” will always come to light. | |||
[Surprise-oh]
[Breathing]
|
[Surprise-oh],这才对嘛![Breathing]纳西妲终于不是任人欺负的纳西妲啦[Breathing],可喜可贺! | [Surprise-oh], that’s more like it! [Breathing] Finally, Nahida isn’t the one people can just pick on anymore [Breathing]—what a joyous moment! | |||
[Dissatisfaction-hnn]
|
[Dissatisfaction-hnn]自己的朋友麻烦你亲自招待。 | [Dissatisfaction-hnn] Please entertain your own friends yourself. | |||
[Laughter]
|
既然是你亲手做的,我就尝尝看吧,[Laughter] | Since you made it yourself, I'll give it a try, [Laughter] | |||
[Surprise-ah]
[Breathing]
|
[Surprise-ah],裟罗![Breathing]我们在找心海,你有见到过她吗? | [Surprise-ah], Sara! [Breathing] We’re looking for Kokomi—have you seen her? | |||
[Question-oh]
|
[Question-oh],是吗也好,既是指控人,又是见证者,还是我命中注定的对手。 | [Question-oh], is that so? Very well—both accuser and witness, and my destined adversary at that. | |||
[Sigh]
|
[Sigh]…咱就不该相信业余摄影师…… | [Sigh] ...We really shouldn't have trusted the amateur photographer... | |||
[Surprise-yo]
|
[Surprise-yo]!自动铅笔芯盒里混着彩色粉笔 | [Surprise-yo]! The mechanical pencil lead box is mixed with colored chalk! | |||
[Question-ei]
|
教育援助[Question-ei]那你一定很熟悉这附近的情况? | Education aid [Question-ei] You must be familiar with the situation around here? | |||
[Uhm]
|
下班后去朱特那买点什么吧…[Uhm]…买点什么好呢? | After work, let’s go to Zhute’s to buy something… [Uhm]… What should we get? | |||
[Question-en]
[Dissatisfaction-hnn]
[Breathing]
|
我制定了荒泷派的「派中法度」,他们就应该谨遵法度行事。[Question-en]?你问他们为什么愿意听话[Dissatisfaction-hnn]?[Breathing]因为我比他们更有说服力。我的语言和拳头,姑且还算是有点分量。 | I’ve laid down the Arataki Gang’s “Code of Conduct,” so they ought to follow it to the letter. [Question-en]? You’re asking why they’re so willing to obey [Dissatisfaction-hnn]? [Breathing] Because I’m more persuasive than they are—my words and my fists still carry some weight. | |||
[Question-ah]
|
[Question-ah],真的吗?!谢谢你,大哥哥,非常谢谢你! | [Question-ah], Really?! Thank you, big brother, thank you so much! | |||
[Shh]
[Breathing]
|
[Shh]你还写诗啦[Breathing]?听,当然听[Breathing]…你的每句话我都听… | [Shh] You even write poetry now? [Breathing] I’m listening—of course I’m listening [Breathing]… I hear every single word you say… | |||
[Question-yi]
[Breathing]
|
[Question-yi][Breathing]?这地儿聚了好些人[Breathing]。看来今天罗浮宜歇业,忌开工。 | [Question-yi][Breathing]? There are quite a few people gathered here [Breathing]. It seems that today, according to Luofu, it’s auspicious to close shop and inauspicious to start work. | |||
[Surprise-wa]
[Breathing]
|
[Surprise-wa][Breathing],厨具这么齐全的话,就能做很多好吃的呢! | [Surprise-wa][Breathing], with such complete kitchenware, you can make a lot of delicious food! | |||
[Cough]
|
就拿和朋友打招呼来举例子吧,[Cough]「以拂晓的晨露向你致以问候,我的挚友。」 | [Cough], Let’s take greeting a friend as an example, [Cough] ‘With the morning dew at dawn, I extend my greetings to you, my dearest friend.’ |
Paralinguistic Tags | ASR Audio Demos | Chinese Transcription | English Translations |
---|---|---|---|
Non-verbal vocalizations
Prosodic/Attitudinal
cues
Discourse-like markers
|
|||
[Breathing]
|
慧星是追求远方的家伙,[Breathing]每隔一段时间就要去太阳附近转一圈,彗星掉落的小石子就是流星体,[Breathing]聚在一起就叫做「流星群」。 | Comets are the ones chasing the distant; [Breathing] every so often they swing by the Sun, and the little stones that fall off them are meteoroids; [Breathing] when they gather, they’re called a “meteor shower.” | |
[Confirmation-en]
|
什么,你喝不完,我能跟你一起喝吗?我跟你讲,我一个人能喝完一瓶啊…[Confirmation-en]。 | What, you can’t finish it? Can I drink with you? I’ll tell you, I can finish a bottle on my own…[Confirmation-en]. | |
[Question-ei]
[Laughter]
|
[Question-ei]?前面有一个吃饭的地方,我先进去吃吃看,也许柴犬小姐丢失的就是勺子或者碗筷,她不好意思说是吃饭的家伙,[Laughter]。 | [Question-ei]? There’s a restaurant ahead; I’ll go in and see—perhaps Miss Shiba Inu lost her spoon or chopsticks and was too embarrassed to admit it,[Laughter]. | |
[Surprise-yo]
|
[Surprise-yo],这二小姐走了一遭回来都不会行礼啦。 | [Surprise-yo], Second Miss—gone off on your little trip and come back without even bothering to bow properly | |
[Surprise-oh]
|
[Surprise-oh]我这个,我这个麦克风有时候就是这样,它指向性蛮强的。有的时候比如说「丘丘或者牙在旁边说话,比较小声的时候就收不到。 | [Surprise-oh], this microphone of mine sometimes acts up—it’s very directional. When Qiqi or Yan speak quietly nearby, it just won’t pick up. | |
[Crying]
|
你死了还要连累我…[Crying]妈你别哭嘛! | Even in death you’re still dragging me down… [Crying] Mom, don’t cry! | |
[Sigh]
|
为了写这篇作文,我完全忘记了睡觉,不知道打了多少次草稿。这么晚了还不睡觉?[Sigh]这关又没过,还得从头开始,太辛苦了。 | I totally forgot to sleep working on this essay; I’ve lost count of how many drafts I’ve written. It’s so late and you’re still not sleeping? [Sigh] I didn’t get past this part again, so I have to start over from scratch—this is brutal. | |
[Surprise-ah]
|
[Surprise-ah]他原来是做贼心虚,以为鸳鸯全看清楚了,怕他真的是喊出声来,自己立马就大祸临头了。 | [Surprise-ah], it turned out he was guilty as a thief—thinking Yuanyang had seen it all, afraid that if she actually cried out, disaster would strike him immediately. | |
[Surprise-oh]
|
我想要大提琴,它底下长了根刺,真好玩,等等拉琴的琴功呢… |
I want the cello; there’s a spike growing underneath it, how fun—wait, what about the bowing technique… |
|
[Cough]
|
下面,有请校长为第一届水球大战的举行致辞[Cough] | Now, please welcome the principal to deliver a speech for the inaugural Water Polo Battle. [Cough] | |
[Laughter]
[Question-ah]
|
[Laughter],误会了,误会了,我可不是神仙啊。美猴王问,[Question-ah]? | [Laughter], it was a misunderstanding, a misunderstanding—I’m not an immortal. The Monkey King asks, [Question-ah]? | |
[Question-oh]
|
这种好酒在你们齐国是喝不到的吧,多喝一些。你们绑着这个人做什么?他犯了罪,请大王发配,[Question-oh],什么罪?偷窃?是哪国人?回大王,是齐国人? | This kind of fine wine you can’t get in your State of Qi, can you? Drink up. What are you binding this man for? He’s committed a crime—your majesty, have him exiled. [Question-oh], what crime? Theft? What countryman is he? Reply to the king: He’s from Qi. | |
[Question-yi]
|
我没有,[Question-yi],外面下雪了吗?没有啊,那你身上都是雪花,是头屑。 | I don’t. [Question-yi], is it snowing outside? No? Then all those snowflakes on you are dandruff. | |
[Question-en]
|
不是我要批评你啊,我还以为是什么难题,这么简单你也不会吗?[Question-en]写出来了? | [Question-en] I’m not trying to criticize you, but I thought it was some tough problem—something this simple and you still can’t do it? Did you write it out? | |
[Surprise-wa]
|
[Surprise-wa]班长,你们带了那么多好吃的呀。 | [Surprise-wa], class president, you all brought so many delicious treats! | |
[Dissatisfaction-hnn]
|
这才是我都愿意接受的人生。安之若命,[Dissatisfaction-hnn]那是什么鬼?要做什么白日梦啦? | This is the kind of life I’m willing to accept—taking it as fate. [Dissatisfaction-hnn] What’s that nonsense? Are you going to start daydreaming? | |
[Uhm]
|
[Uhm],即便如此,现在也是形容女生的,你不可以说我。 | [Uhm], even so, that’s something used to describe girls now—you can’t call me that. | |
[Shh]
|
我看一定是…[Shh],这种话你也敢乱说,要是被有心人听到了搬一句是非,你小命还要不要啊? | I’m thinking it’s gotta be… [Shh], you actually dare to say stuff like that recklessly? If someone with ill intent hears it and starts spreading rumors, will you even have a life left? |