Prerecorded voices would require limited phrases. This assumption doesn't hold for routing engines adding street names into the instructions.
Also supporting multiple routing engines doesn't help either ...
In theory, its possible. I don't know whether Valhalla imports multiple names (if available) when making the tiles. As soon as the names are available, should be possible to tag them in the instructions. Its not done yet, as far as I know.
I haven't filed the issue describing it in Valhalla's repo since I am not sure we can actually make use of this output (yet?). I think I saw somewhere mimic/flite issue regarding reading multi-language text, but it wasn't resolved at that time.
espeak -s 120 -m '<speak xml:lang="en">In 300 meters turn right to <voice name="de">Schloss Schönbrunn</voice> Then continue straight and watch for horse carriages.</speak>'
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> This is not <break strength='none' /> a pause. This is a <break strength='x-weak' /> phrase break. This is a <break strength='weak' /> phrase break. This is a <break strength='medium' /> sentence break. This is a <break strength='strong' /> paragraph break. This is a <break strength='x-strong' /> paragraph break. This is a <break time='3s' /> three second pause. This is a <break time='4500ms' /> 4.5 second pause. This is a <break /> sentence break. <!- Changing Voices -> This is the default voice. <voice name="en-sc">This is David.</voice> This is the default again. <voice name="Callie">Callie here.</voice>" <!- Adjusting Speech Rate -> I am now <prosody rate='x-slow'>speaking at half speed.</prosody> I am now <prosody rate='slow'>speaking at 2/3 speed.</prosody> I am now <prosody rate='medium'>speaking at normal speed.</prosody> I am now <prosody rate='fast'>speaking 33% faster.</prosody> I am now <prosody rate='x-fast'>speaking twice as fast</prosody> I am now <prosody rate='default'>speaking at normal speed.</prosody> I am now <prosody rate='.42'>speaking at 42% of normal speed.</prosody> I am now <prosody rate='2.8'>speaking 2.8 times as fast</prosody> I am now <prosody rate='-0.3'>speaking 30% more slowly.</prosody> I am now <prosody rate='+0.3'>speaking 30% faster.</prosody> </speak>
speak -m -f <path to file>
Which brings the question up regarding partitioning: is it modRana/Poor Maps/... who is supposed to split the sentence and later glue it together or TTS? I would expect TTS to do that, but maybe its naïve.