Poll: What advanced text entry method(s) would you like to see on Sailfish?
Poll Options
What advanced text entry method(s) would you like to see on Sailfish?

Reply
Thread Tools
Posts: 27 | Thanked: 35 times | Joined on Jan 2016 @ Sweden
#241
Originally Posted by itdoesntmatt View Post
Ciao a te e grazie tante per il vostro impegno!
Ciao! No problem... it's getting more complicated than I thought
I get what you mean now. It's an option but this is up to Eber

I tried the various approaches suggested here. I'll try further with this:

Code:
tr '[:upper:]' '[:lower:]' < sanitized.list > sanitized.lower
tr -s [:space:] \\n < sanitized.lower | sort | uniq > sanitized.uniq
So, essentially: lowercase everything, put on single line, sort, remove duplicates.

It is a bit better:
Code:
wc sanitized.uniq
 13615329  13615329 236920596 sanitized.uniq
I'll then fetch a list of proper Italian nouns of people and cities and push them in the file, so to preserve some basic capitalisation.

I'll try to generate the file again tomorrow. Any ideas or help with the dict are more then welcome
 

The Following 2 Users Say Thank You to spidernik84 For This Useful Post:
Posts: 102 | Thanked: 187 times | Joined on Jan 2010
#242
Originally Posted by spidernik84 View Post
1) it's getting more complicated than I thought

2) It is a bit better:
Code:
wc sanitized.uniq
 13615329  13615329 236920596 sanitized.uniq
1) it usually is.
2) Well, this is what I did yesterday:
Code:
aspell -l it dump master | aspell -l it expand | aspell -l it clean | sort | sort -uf | wc -l
8565009
which folds but keeps as much of capitalisation as possible.
 

The Following 2 Users Say Thank You to ljo For This Useful Post:
Posts: 86 | Thanked: 362 times | Joined on Dec 2007 @ Paris / France
#243
Check out the last commit: You can now provide your own dictionary file instead of using the aspell one. You just need to create a words-it.txt file in $CORPUS_DIR with one word per line.

Also, if your input corpus uses the word "dall'oceano" and your dictionary contains "dall" and "oceano" and not "dall'oceano", they will be handled as two different words.
 

The Following 5 Users Say Thank You to eber42 For This Useful Post:
Posts: 529 | Thanked: 988 times | Joined on Mar 2015
#244
eber sorry my ignorance but what do you mean for input corpus? however is it possibile to show both dall and dall' when swyped d-a-l-l? thanks

PS: anything about eventual solution to avoid prediction bar get hidden from whatsapp input field in dalvik full screen mode?
 

The Following User Says Thank You to itdoesntmatt For This Useful Post:
Posts: 86 | Thanked: 362 times | Joined on Dec 2007 @ Paris / France
#245
Originally Posted by itdoesntmatt View Post
eber sorry my ignorance but what do you mean for input corpus? however is it possibile to show both dall and dall' when swyped d-a-l-l? thanks
You will have to input word separators manually for now. Handling this automatically needs a larger evolution (it is in the roadmap, but no promise or time estimate)

Originally Posted by itdoesntmatt View Post
PS: anything about eventual solution to avoid prediction bar get hidden from whatsapp input field in dalvik full screen mode?
If you talk about the transparency issue, it is a compatibility issue with SFOS 2. As i'm not in a hurry to upgrade, I will try to make a fix that work with both versions
 

The Following 4 Users Say Thank You to eber42 For This Useful Post:
Posts: 27 | Thanked: 35 times | Joined on Jan 2016 @ Sweden
#246
Originally Posted by ljo View Post
1) it usually is.
2) Well, this is what I did yesterday:
Code:
aspell -l it dump master | aspell -l it expand | aspell -l it clean | sort | sort -uf | wc -l
8565009
which folds but keeps as much of capitalisation as possible.
Thank you! I'll use your variant.
Back to crunching the numbers. Let's see if this time it goes through
 

The Following User Says Thank You to spidernik84 For This Useful Post:
Posts: 27 | Thanked: 35 times | Joined on Jan 2016 @ Sweden
#247
Hello. The last attempt failed with an overflow, despite limiting the dictç

Code:
13566000
13567000
13568000
13569000
13570000
13571000
13572000
Traceback (most recent call last):
  File "/home/nicvol/okboard/db/../tools/loadkb.py", line 28, in <module>
    t.endLoad()
  File "/home/nicvol/okboard/tools/gribouille.py", line 120, in endLoad
    self._rec_load(self.tree)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 147, in _rec_load
    child_index = self._rec_load(child, pre + letter)
  File "/home/nicvol/okboard/tools/gribouille.py", line 133, in _rec_load
    self._write_node(index, letter = None, last_child = (nchilds == 0), payload = True, dest_index = self.cur_index)
  File "/home/nicvol/okboard/tools/gribouille.py", line 61, in _write_node
    if dest_index >= (1 << 24) - 10: raise Exception("overflow")
Exception: overflow
make: *** [it.tre] Error 1
+ rsync -av '*.tre' '*.db' '*.ng' '*.rpt.bz2' /home/nicvol/okboard/db/
sending incremental file list
rsync: link_stat "/media/storage/nicvol/corpus/work/*.tre" failed: No such file or directory (2)
rsync: link_stat "/media/storage/nicvol/corpus/work/*.db" failed: No such file or directory (2)
rsync: link_stat "/media/storage/nicvol/corpus/work/*.ng" failed: No such file or directory (2)
rsync: link_stat "/media/storage/nicvol/corpus/work/*.rpt.bz2" failed: No such file or directory (2)

sent 12 bytes  received 12 bytes  48.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]
Did anyone encounter this with the other languages?
 
Posts: 635 | Thanked: 1,535 times | Joined on Feb 2014 @ Germany
#248
What is the size of your dictionary file and how many words does it contsin?
 

The Following User Says Thank You to mautz For This Useful Post:
Posts: 27 | Thanked: 35 times | Joined on Jan 2016 @ Sweden
#249
There we go!

Code:
#cat words-it.txt | wc -l
13572262

#ls -lrth
-rw-rw-r-- 1 nico nico  41M Jan 23 21:35 corpus-it.txt
-rw-rw-r-- 1 nico nico 226M Jan 26 21:12 words-it.txt
 
Posts: 635 | Thanked: 1,535 times | Joined on Feb 2014 @ Germany
#250
Your corpus file is way too small. How many sentences does it include? And on the other hand your dictionary is way too big. Even if it does compile OkBoard will crash with such a huge dictionary.

My corpora has a filesize about 200MB and contains around 2000000 sentences.
My dictionary has a size of nearly 1MB and contains around 100000 words. I tried a dictionary with 17 million words(size was around 30MB i think) and OKBoard crashed everytime i started it.

Last edited by mautz; 2016-01-27 at 17:51.
 

The Following User Says Thank You to mautz For This Useful Post:
Reply

Tags
bettertxtentry, huntnpeck sucks, okboard, sailfish, swype


 
Forum Jump


All times are GMT. The time now is 23:28.