Page 1 of 1

OCR for Hebrew?

Posted: Sat Oct 19, 2013 11:25 am
by Jonathan Robie
What is the best OCR for Hebrew?

What is the best open source OCR for Hebrew? Does anyone have experience with this one?

https://code.google.com/p/qhocr/

Re: OCR for Hebrew?

Posted: Sat Oct 19, 2013 10:15 pm
by Ken M. Penner
The only success I've had with Hebrew OCR is ABBYY FineReader Pro. It doesn't handle pointed Hebrew out of the box, but it can be trained to do so.

Re: OCR for Hebrew?

Posted: Sun Oct 20, 2013 8:56 am
by Sebastian Walter
Do you need OCR with mixed character recognition or only Hebrew character recognition?

Of course there are ReadIRIS and ABBYY FineReader, but they are pretty expensive (+ they aren't Open Source).
I tried QHOCR, but it seems to work well only with really high quality scans:
I tried it again because of your question. I took a screenshot of the main text of this page (2x "+") and OCRed it. This is the result:
זָ'*-שִלָלם ב*צָט' םִכ*'ָ חַשָזָת'*
כ' *ל*תַ' עָברוּ רבשָ'
'פַשז כָבִד 'כבדָו פםנִ'י
דבאִ'שו נָ*קו חַבוּרתָ'
*כָ*' זָוַלרָ'*
נ*ו'ת' שדָוהִ' פד-עפָד
כָל--'ֹו* לדר דל-ח'*
כְ'-כָבָלַ' םָלאָו נק?ה
וגָ'ן םהב כּ-'*רִ'י
נִ'ו*ת' ו*ד*'תִ' עד-טפָד
שָאנח' *נהםָח לב* .
אדנָֹ' נָ'דךָ כָ?-חּזְוָחָ'
וִאַנחָת' פִכָךָ לגָ-נםהרָהי
לב' *-רחר ם'ָבנ' כֹח'
וִאור-**'נ'ָ *ב דב א'ן' גָת'*
אְֹהִב' וִרִ*' ם*יד ננפָ' 'ַפְשָ*
*רובַ' פרָחָ* עָםדוי
וַ'נַקשָו םככִ'זִ' נַכָש'
But the best one for mixed characters seems to be tesseract anyway. It's a command-line tool, so you'll probably need a GUI - here are some links. Roi Dayan even has uploaded hebrew training data, but I didn't try them out yet.

edit: Oh...
But I'm also the guy who runs the software here
Well, perhaps you won't need the GUI :)

Re: OCR for Hebrew?

Posted: Mon Oct 21, 2013 9:48 am
by Ken M. Penner
Here's what ABBYY does with that page Sebastian mentioned, (after 2x "+"), with "Hebrew" set as the language.
אין־שלום בעצמי מפני חטאתי:
בי עונתי עברו ראשי
כמשא כבד יכבדו מטוני:
הבאישו נמקו חבוריתי
• : .יד ־ 1 **
מפני אולתי:
נערתי שהותי עד־מאד
כל־היום ר{דר הלכתי:
כי־כסלי מלאו נקלה
•• .י: ד- ▼ : ו. • '1 •־־*
ואיך מתם בכשרי:
נפוגתי ונל־כיתי עד־מאד
שאגתי מנהמת לבי:
אדני נגךןן כל־תאותי
!אנחתי ממןדלא־נסתרה:
לבי סחךחר עזבני כחי
ואור־עיני גם־הם איךאתי:
אבי,צי ו ורעי מנגד נגעי!עמדו
וקרובי מרחק עמדו:
וינקשו ו מבקשי נפשי
As you can see, ABBYY is vastly superior to QHOCR for the consonantal text.