Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
289 views
in Technique[技术] by (71.8m points)

Google speech to text SSML ignores pauses occasionally for unknown reasons

I've had this problem since I began using Google's text to speech, now I'm determined to fix it. 22% of the time, the SSML language will not work and a text will be rendered without pauses for no reasons that I'm aware of. I really wish Google would just put the pauses in automatically for me. An audio text without pauses is virtually unlistenable. In short the program will ignore the syntax

<break time="0.4s"/>

But it will only do this for some of the texts. I should also add that I divide the text up into chunks of I think 3000 characters and the software will either obey all of the break times for that text or 5% of the time it will ignore all of the break times for that text.

Due to the fact that stackoverflow will not show text between < > I cannot post actual text that is causing the problem, so I must post the text here:

The exact text I converted into audio is located here:

problematic texts

Each text is preceded by a number surrounded by __ . On both occasions the following chunks failed both times:

16 41 46 58 59 61 65 74 80 85 86 87 90 91 92 94 95 96 97 98

The following chunks failed once out of two tries

40 45 47 81 82 89

Here is the code I'm using

from google.cloud import texttospeech
str1 = 'my_credentials.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = str1
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(ssml=txt1)
voice = texttospeech.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3,
    speaking_rate=.85
    )

response = client.synthesize_speech(input=input_text,
                                    voice=voice,
                                    audio_config=audio_config)
with open(f'{self.folder}{idx}.mp3', 'wb') as out:
    out.write(response.audio_content)

I'm using version 2.2.0


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...