Skritter | OCR and learning Chinese and Japanese

Newer Topic Created 15 years ago Older Topic

OCR and learning Chinese and Japanese

ChrisClark April 5th, 2011 2:25a.m.

Has anyone been able to successfully use OCR (optical character recognition) in their language studies? This morning I tried out:

Google Tesseract (a free command line OCR program, has simplified and traditional, left-to-right only)

Google Docs OCR (also free - simplified left-to-right only)

the trial version of ABBYY FineReader 10 (commercial software, supports... everything)

Tesseract and Google Docs didn't seem flexible or accurate enough to really be of any use. FineReader is awesome (as are other similar commercial packages I'm sure), easily handling right-to-left, vertical text mixed with graphics, but I'm probably not prepared to shell out $400-$600 for the full version. But it would be phenomenal to be able to freely convert most any reading material I encounter into a Pleco-compatible text.

And that brings me to the another question - does anyone have experience with the Pleco OCR features? That's probably the least labor-intensive option! I can't use it because I have an iPod, not an iPhone.

wb April 5th, 2011 2:50a.m.

depending on the version of your iPod, OCR might still work, with the iPod camera or at least the "still photo" (think that's how it's called) mode...the camera works for larger characters, but I would say in many textbooks they are large enough...the problem with the still ocr is not the recognition but the work flow for many image files I think...

ChrisClark April 5th, 2011 5:05a.m.

I have the first generation iPod touch (same generation as the 3rd generation iPhone), so there's no camera. Thanks though!

alxx April 30th, 2011 8:13a.m.

The pleco ocr works reasonably well on my iphone 4.

A lot depends on the lighting , the contrast and the colour of the characters and the background.

Seen both tessaract and ABBY used in lpr (license plate recognition)
along with opencv and others.

A lot depends on the preparation of the image, contrast, the lighting, cropping.
One trick is it break it down into tiles and process the tiles or to have a recognition window that is stepped/moved across the image.

The training of the program/library is the critical part.
Changing the font can change the recognition rate for documents if the training has been limited.

ChrisClark April 30th, 2011 8:47a.m.

Thanks, @alxx!

Chris

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!

create an account

recover an account

OCR and learning Chinese and Japanese