Reply
Thread Tools
Posts: 7 | Thanked: 47 times | Joined on Aug 2010
#1
Hello there, just a quick post on how I got PocketSphinx to work on my n900, as well as a basic python application to test your setup. I take no credit for anything on this thread, except the time spent putting these all together.


I downloaded all the .debs from http://repository.maemo.org/extras-d.../pocketsphinx/ into a new directory, removing any i386 specific .debs.

As root, I ran "dpkg -i *" and they tried to install, but were stopped, due to unment dependencies. (for me it was just python2.5-dbg)

This sucessfully ran, and installed pocketsphinx.

To try it out and make sure everything has installed correctly, run "pocketsphinx_continuous", and wait for everything to load. When prompted with "Ready..." say something clearly in the phones direction, (I used "Hello"). After another load of text there should be "000000001: hello (-12345676)".


To get the gstreamer hooks working, I had to install the package "gstreamer-tools".

After this I raw the Script here from the CMUSphinx example, tweaked to work for pulseaudio, http://pastebin.com/zCYzX65Z

Press the "Speak" button, then say your few words, and the textbox with update to show what you have said.

N.B. It uses the en_US acoustic model by default, therefore I had a good few mistakes at first which I attrute to my Irish accent.



This is another little sample that uses the JSGF grammer specification, and tries to interpret speech from a .wav file saved locally. (This needs to be recorded at 8khz mono, also)

==File grammer.jsgf==
PHP Code:
#JSGF V1.0;
grammar goforward;
public <
move> = go <direction> <distance> [meter meters];
<
direction>= forward backward;
<
distance>= (one two three four five six seven eight nine ten twenty)+; 
==File speechtest.py== (with myrecording.wav as the recording to interpret)
PHP Code:
#!/usr/bin/python
import pocketsphinx as ps
decoder 
ps.Decoder(jsgf=/path/to/your/jsgf/grammar.jsgf’,samprate=’8000&#8242;)
fh open(“myrecording.wav”“rb”)
nsamp decoder.decode_raw(fh)
hyputtidscore decoder.get_hyp()
print 
“Got result %%d” % (hypscore

Last edited by chemist; 2011-04-27 at 15:17. Reason: topic
 

The Following 35 Users Say Thank You to mc_teo For This Useful Post:
Boemien's Avatar
Posts: 770 | Thanked: 558 times | Joined on Mar 2010 @ Abidjan
#2
Yeah it seems interesting but noobs, like me of course, need some screenshots. Thanks in advance!!!
 
joerg_rw's Avatar
Posts: 2,222 | Thanked: 12,651 times | Joined on Mar 2010 @ SOL 3
#3
many thanks for this kickoff. I think this can be the start of a nice project to bring a missing feature to N900.

/j
__________________
Maemo Community Council member [2012-10, 2013-05, 2013-11, 2014-06 terms]
Hildon Foundation Council inaugural member.
MCe.V. foundation member

EX Hildon Foundation approved
Maemo Administration Coordinator (stepped down due to bullying 2014-04-05)
aka "techstaff" - the guys who keep your infra running - Devotion to Duty http://xkcd.com/705/

IRC(freenode): DocScrutinizer*
First USB hostmode fanatic, father of H-E-N
 

The Following 5 Users Say Thank You to joerg_rw For This Useful Post:
Posts: 482 | Thanked: 550 times | Joined on Oct 2010
#4
So...would it be possible to use this for voice dialing via a dbus call?
 

The Following User Says Thank You to skykooler For This Useful Post:
Posts: 105 | Thanked: 99 times | Joined on Feb 2011 @ India
#5
great, now if only someone could integrate it with the text editor of the phone
 
Posts: 102 | Thanked: 23 times | Joined on Apr 2010
#6
This is just great news and thanks mc_teo...
Now that joerg_rw is interested in this project.. it will be a greater news soon :-)

Last edited by leojab; 2011-04-27 at 17:15.
 
cfh11's Avatar
Posts: 1,062 | Thanked: 961 times | Joined on May 2010 @ Boston, MA
#7
Awesome! Now if this becomes feature complete and incorporated into the CSSU that would be a dream come true...
__________________
Want to browse streamlined versions of websites automatically when in 2g? Vote for this brainstorm.

Sick of your cell signal not reconnecting after coming out of a bad signal area? Vote for this bug.
 
joerg_rw's Avatar
Posts: 2,222 | Thanked: 12,651 times | Joined on Mar 2010 @ SOL 3
#8
voice-call via dbus: should be rather simple, as long as you start the speech input engine on headset pushbutton and use a small set of pretrained contact name vocabulary.

integration with text editor: an ambitious project, as the vocabulary is virtually unlimited

@leojab: I'm planning to come up with a system architecture RFC eventually, so this could actually integrate into hildon/maemo seamlessly. NB you want both a) use speech input with unpatched possibly even closed source apps, and also work on several concurrent apps without multiple instances of pocketsphinx fighting each other
@cfh11: regarding my comments 1 line above I think we might integrate this in a way we can deploy it via extras, no need for CSSU. Well maybe hildon-desktop needs some hooks for cooperating with speech controlled task switching etc

/j
__________________
Maemo Community Council member [2012-10, 2013-05, 2013-11, 2014-06 terms]
Hildon Foundation Council inaugural member.
MCe.V. foundation member

EX Hildon Foundation approved
Maemo Administration Coordinator (stepped down due to bullying 2014-04-05)
aka "techstaff" - the guys who keep your infra running - Devotion to Duty http://xkcd.com/705/

IRC(freenode): DocScrutinizer*
First USB hostmode fanatic, father of H-E-N
 

The Following 11 Users Say Thank You to joerg_rw For This Useful Post:
Posts: 7 | Thanked: 47 times | Joined on Aug 2010
#9
So, I haven't been working too hard on this, due to school and all, but I have put together this Demo of what can be done.

I have attached a player.zip. within this archive, find three files, "player.py" which is the main script, "dict.lm" which contains some language stuff, and "dict.dic" which contains the dictionary.

so ensuring pocketsphix in installed, as outlined in my first post, run this script.

if the default mediaplayer is not open, it will attempt to open it, via a dbus command (and complain about file not found). so perhaps opening it before hand is the best solution.

then start the script, and you will be presented with a simple form. press enable to enable, and then say either play/stop/pause/resume/next/previous to run a command.

English only supported at the moment.

happy speaking

~mc_teo
Attached Files
File Type: zip player.zip (2.9 KB, 165 views)
 

The Following 9 Users Say Thank You to mc_teo For This Useful Post:
Flandry's Avatar
Posts: 1,559 | Thanked: 1,786 times | Joined on Oct 2009 @ Boston
#10
Good to see this getting some attention after it was passed over for the GSoC last year *.

A possibly less cumbersome alternative way for the curious to install is using fapman (choose the "All packages (ADVANCED)" under Category Filters and then search for sphinx). You don't need any of the debug packages or the two chinese model packages; install all the others. I did notice that the packages aren't optified, which means that with the available acoustic and language models you could eat up over 13MB root space. Consider yourself warned. I haven't access to my linux box to re-upload the packages with optification.

Worth a giggle if nothing else. With the provided large dictionary and language model the result of talking to your N900 is rather comical.

Edit: Removed command -- the default works fine.
__________________

Unofficial PR1.3/Meego 1.1 FAQ

***
Classic example of arbitrary Nokia decision making. Couldn't just fallback to the no brainer of tagging with lat/lon if network isn't accessible, could you Nokia?
MAME: an arcade in your pocket
Accelemymote: make your accelerometer more joy-ful

Last edited by Flandry; 2011-06-16 at 19:09.
 

The Following 3 Users Say Thank You to Flandry For This Useful Post:
Reply

Thread Tools

 
Forum Jump


All times are GMT. The time now is 10:18.