OK, now I got it. Text corpus makes sence for prediction. I have access to text corpus data which is about 60Gb. Is that enought or should I try to search bigger one? The data that I have found is in different zip files. EDIT: Here is more information about the data. It is in VRT file format:https://www.kielipankki.fi/developme...-input-format/ I would be really grateful for help since my programming skills are very limited.