Devoted for Information Technology. Contributing my knowledge to my community.
Similar Post :: http://crblpocr.blogspot.com/2008/07/how-to-train-bangla-and-devanagari.html
Hi Vanna,Thanks for that.Ms. Sochenda from PAN Localization Cambodia is also testing Khmer Kep font with Tesseract.Please keep in touch,ING LengIengPAN Localization Cambodia
Dear Lengleng,You're welcome.I am doing Khmer OCR research for my master thesis.
Thanks for posting here.
Thanks for this good tutorial, it helped me a lot!On my machine, cnTraining.exe and mfTraining.exe from v2.04 did not work, I had to use the ones from v2.00.
simple recap of instructions (worked great thanks)you need to create 8 files1, freq-dawg 2, word-dawg 3, user-words (can be empty file)4, inttemp 5, normproto 6, pffmtable 7, unicharset8, DangAmbigs (can be empty file)steps to get themstart with a file.tif ready to train for ocr-----------------1: `tesseract file.tif file batch.nochop makebox`2: `mv file.txt file.box`3: edit file.box to match the appropriate text -or you can use a helper tool- windows tool: sites.google.com/site/spilkaondrej linux tool: tesseractTrainer.py (download section)4: tesseract file.tif junk nobatch box.train5: mftraining file.tr6: cntraining file.tr7: unicharset_extractor file.box8: create 'frequent_words_list' & 'words_list' with atleast 1 word in each text file (words seperated by new line)9: wordlist2dawg frequent_words_list freq-dawg10: wordlist2dawg words_list word-dawg11: touch DangAmbigs user-words12: `move DangAmbigs xxx.DangAmbigs; move freq-dawg xxx.freq-dawg; ...` #move ... all 8 files that are listed above13: `move xxx.* /usr/share/tesseract/tessdata/`14: tesseract file.tif output -l xxx should now have a correct output.txt file
thanks for your help
Great positing, your intructions work and more clear than the wiki.Now just need to know if I can add my files or additional files to an existing language...
Excellent tutorial. Works just perfect. Thanks.
Very good tutorial!
Subscribe in a reader
Banana Republic Coupons