Tesseract Training_for Khmer Language_For Posting
Please also look at the instructions by tesseract page: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
Subscribe to:
Post Comments (Atom)
Devoted for Information Technology. Contributing my knowledge to my community.
Similar Post :: http://crblpocr.blogspot.com/2008/07/how-to-train-bangla-and-devanagari.html
ReplyDeleteHi Vanna,
ReplyDeleteThanks for that.
Ms. Sochenda from PAN Localization Cambodia is also testing Khmer Kep font with Tesseract.
Please keep in touch,
ING LengIeng
PAN Localization Cambodia
Dear Lengleng,
ReplyDeleteYou're welcome.
I am doing Khmer OCR research for my master thesis.
Thanks for posting here.
ReplyDeleteThanks for this good tutorial, it helped me a lot!
ReplyDeleteOn my machine, cnTraining.exe and mfTraining.exe from v2.04 did not work, I had to use the ones from v2.00.
simple recap of instructions (worked great thanks)
ReplyDeleteyou need to create 8 files
1, freq-dawg 2, word-dawg 3, user-words (can be empty file)
4, inttemp 5, normproto 6, pffmtable 7, unicharset
8, DangAmbigs (can be empty file)
steps to get them
start with a file.tif ready to train for ocr
-----------------
1: `tesseract file.tif file batch.nochop makebox`
2: `mv file.txt file.box`
3: edit file.box to match the appropriate text
-or you can use a helper tool-
windows tool: sites.google.com/site/spilkaondrej
linux tool: tesseractTrainer.py (download section)
4: tesseract file.tif junk nobatch box.train
5: mftraining file.tr
6: cntraining file.tr
7: unicharset_extractor file.box
8: create 'frequent_words_list' & 'words_list' with atleast 1 word in each text file
(words seperated by new line)
9: wordlist2dawg frequent_words_list freq-dawg
10: wordlist2dawg words_list word-dawg
11: touch DangAmbigs user-words
12: `move DangAmbigs xxx.DangAmbigs; move freq-dawg xxx.freq-dawg; ...`
#move ... all 8 files that are listed above
13: `move xxx.* /usr/share/tesseract/tessdata/`
14: tesseract file.tif output -l xxx
should now have a correct output.txt file
thanks for your help
DeleteGreat positing, your intructions work and more clear than the wiki.
ReplyDeleteNow just need to know if I can add my files or additional files to an existing language...
Excellent tutorial. Works just perfect. Thanks.
ReplyDeleteVery good tutorial!
ReplyDelete