I've had this problem since I began using Google's text to speech, now I'm determined to fix it. 22% of the time, the SSML language will not work and a text will be rendered without pauses for no reasons that I'm aware of. I really wish Google would just put the pauses in automatically for me. An audio text without pauses is virtually unlistenable. In short the program will ignore the syntax
<break time="0.4s"/>
But it will only do this for some of the texts. I should also add that I divide the text up into chunks of I think 3000 characters and the software will either obey all of the break times for that text or 5% of the time it will ignore all of the break times for that text.
Due to the fact that stackoverflow will not show text between < > I cannot post actual text that is causing the problem, so I must post the text here:
The exact text I converted into audio is located here:
problematic texts
Each text is preceded by a number surrounded by __ . On both occasions the following chunks failed both times:
16 41 46 58 59 61 65 74 80 85 86 87 90 91 92 94 95 96 97 98
The following chunks failed once out of two tries
40 45 47 81 82 89
Here is the code I'm using
from google.cloud import texttospeech
str1 = 'my_credentials.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = str1
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(ssml=txt1)
voice = texttospeech.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=.85
)
response = client.synthesize_speech(input=input_text,
voice=voice,
audio_config=audio_config)
with open(f'{self.folder}{idx}.mp3', 'wb') as out:
out.write(response.audio_content)
I'm using version 2.2.0
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…