The steps to do that are these: -You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too) - You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory - You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post) -You need the corpora files of your language (e.g. Spanish) http://corpora2.informatik.uni-leipzig.de/download.html http://www.cs.upc.edu/~nlp/wikicorpus/ http://opus.lingfil.uu.se/OpenSubtitles2016.php http://www.lllf.uam.es/ESP/Corlec.html https://tatoeba.org/spa/downloads - You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro. - You need "lbzip2" package installed in your system too. -You need "rsync" installed in your system. -You need "QT5" installed in your system. - Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/username/okboard/langs) -If you have several corpora files, then: Code: cat file1 file2 file3 file4 file5 > corpus-es.txt - Open a terminal window - And set the two environment variables: Code: export CORPUS_DIR=/home/username/okboard/langs Code: export WORK_DIR=/home/username/okboard/langs - You can see those variables with Code: echo $VARIABLE_NAME if you're curious - You need to compress the file (Spanish.dic.txt) you put before in /home/username/okboard/langs: Code: bzip2 Spanish.dic.txt - Now should be named corpus-$LANG.txt.bz2 In our case: corpus-es.txt.bz2 because of Spanish - There should be a single file inside. - The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code. Code: cd /home/username/okb-engine-master/ - In 'db' folder you must create a lang-es.cf file first. You can copy it from another .cf file in the same folder (e.g. copy lang-en.cf and rename it into lang-es.cf) -And left only ASCII characteres on those files: Code: lbzip2 -d < corpus.txt.bz2 | clean_corpus.py | lbzip2 > new_corpus.txt.bz2 - Execute Code: db/build.sh es ("es" in case of Spanish) - After this, the script create the dictionaries for OKBoard with next list of files: add-words-fr.txt es-predict.dict lang-fr.cf clusters-es.log es-test.txt.bz2 lang-nl.cf clusters-es.txt es.tre predict-es.db corpus-es.txt.bz2 grams-es-full.csv.bz2 predict-es.ng db.version grams-es-learn.csv.bz2 predict-es.rpt.bz2 es-full.dict grams-es-test.csv.bz2 predict-es.txt.bz2 es-full.tre lang-en.cf words-es.txt es-learn.txt.bz2 lang-es.cf - So, now we have the Spanish dictionary created. After this. I don't know what to do with these files. So any help is welcome
cat file1 file2 file3 file4 file5 > corpus-es.txt
export CORPUS_DIR=/home/username/okboard/langs
export WORK_DIR=/home/username/okboard/langs
echo $VARIABLE_NAME
bzip2 Spanish.dic.txt
cd /home/username/okb-engine-master/
lbzip2 -d < corpus.txt.bz2 | clean_corpus.py | lbzip2 > new_corpus.txt.bz2
db/build.sh es