View Single Post
Posts: 1,548 | Thanked: 7,510 times | Joined on Apr 2010 @ Czech Republic
#1992
Originally Posted by rinigus View Post
Prerecorded voices would require limited phrases. This assumption doesn't hold for routing engines adding street names into the instructions.
Actually, I think in many cases the local names are not thatimportant or even useful, especially if you are in some unknown terrain and the names are in a foreign language (and possibly even in a different alphabet). Knowing the next manuever is a right turn/taking the 5th exit, etc. can be enough.

Originally Posted by rinigus View Post
Also supporting multiple routing engines doesn't help either ...
Yeah, you might not have phrases matching all the manuevers your routing engine could generate. Also the voice samples could be sizeable and distrubution would have to be solved somehow. Properly working TTS is just so much more flexible.

BTW, the Marble project created a rather comprehensive set of navigation voice samples in many languages.


Originally Posted by rinigus View Post
In theory, its possible. I don't know whether Valhalla imports multiple names (if available) when making the tiles. As soon as the names are available, should be possible to tag them in the instructions. Its not done yet, as far as I know.
Interesting! So provided we have a TTS that could use different languages for a single voice message (for example: instructions in English with local names in German), it could make sense to open a RFE with them.

Originally Posted by rinigus View Post
I haven't filed the issue describing it in Valhalla's repo since I am not sure we can actually make use of this output (yet?). I think I saw somewhere mimic/flite issue regarding reading multi-language text, but it wasn't resolved at that time.
Espeak can definitely do that, for example:
Code:
espeak -s 120 -m '<speak xml:lang="en">In 300 meters turn right to <voice name="de">Schloss Schönbrunn</voice> Then continue straight and watch for horse carriages.</speak>'
This says the main message in English with the local name in German and can be easily extended to test other combinations. Would still be good if other TTS engines could do that as well, ideally using the same SSML syntax speak supports. It can do much more than just changing voice, for a more comprehensive example, you can save this to file:

Code:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US">
This is not <break strength='none' /> a pause.
This is a <break strength='x-weak' /> phrase break.
This is a <break strength='weak' /> phrase break.
This is a <break strength='medium' /> sentence break.
This is a <break strength='strong' /> paragraph break.
This is a <break strength='x-strong' /> paragraph break.
This is a <break time='3s' /> three second pause.
This is a <break time='4500ms' /> 4.5 second pause.
This is a <break /> sentence break.
<!- Changing Voices ->
This is the default voice. <voice name="en-sc">This is David.</voice> This is the default again.
<voice name="Callie">Callie here.</voice>"
<!- Adjusting Speech Rate ->
I am now <prosody rate='x-slow'>speaking at half speed.</prosody>
I am now <prosody rate='slow'>speaking at 2/3 speed.</prosody>
I am now <prosody rate='medium'>speaking at normal speed.</prosody>
I am now <prosody rate='fast'>speaking 33% faster.</prosody>
I am now <prosody rate='x-fast'>speaking twice as fast</prosody>
I am now <prosody rate='default'>speaking at normal speed.</prosody>
I am now <prosody rate='.42'>speaking at 42% of normal speed.</prosody>
I am now <prosody rate='2.8'>speaking 2.8 times as fast</prosody>
I am now <prosody rate='-0.3'>speaking 30% more slowly.</prosody>
I am now <prosody rate='+0.3'>speaking 30% faster.</prosody>
</speak>
And then play it with:

Code:
speak -m -f <path to file>
Originally Posted by rinigus View Post
Which brings the question up regarding partitioning: is it modRana/Poor Maps/... who is supposed to split the sentence and later glue it together or TTS? I would expect TTS to do that, but maybe its naïve.
I'm not sure I understand - TTS just converts text to speach, so as long as the input text is correctly marked up (say with SSML) all should be fine.

Or do you mean a case where the TTS engine does not support changing settings per "session" and you would have to basically stich multiple sound files together to achieve the same end result?
__________________
modRana: a flexible GPS navigation system
Mieru: a flexible manga and comic book reader
Universal Components - a solution for native looking yet component set independent QML appliactions (QtQuick Controls 2 & Silica supported as backends)
 

The Following 7 Users Say Thank You to MartinK For This Useful Post: