OCR for Hebrew?

A place for members to share information and news about books, software, and websites of interest.
Forum rules
Members will observe the rules for respectful discourse at all times!
Please sign all posts with your first and last (family) name.
Post Reply
Jonathan Robie
Posts: 9
Joined: Wed Sep 11, 2013 2:53 pm

OCR for Hebrew?

Post by Jonathan Robie »

What is the best OCR for Hebrew?

What is the best open source OCR for Hebrew? Does anyone have experience with this one?

https://code.google.com/p/qhocr/
User avatar
Ken M. Penner
Posts: 83
Joined: Wed Sep 11, 2013 12:31 pm

Re: OCR for Hebrew?

Post by Ken M. Penner »

The only success I've had with Hebrew OCR is ABBYY FineReader Pro. It doesn't handle pointed Hebrew out of the box, but it can be trained to do so.
Ken M. Penner, Ph.D.
St. Francis Xavier University
Sebastian Walter
Posts: 17
Joined: Thu Oct 10, 2013 3:06 am

Re: OCR for Hebrew?

Post by Sebastian Walter »

Do you need OCR with mixed character recognition or only Hebrew character recognition?

Of course there are ReadIRIS and ABBYY FineReader, but they are pretty expensive (+ they aren't Open Source).
I tried QHOCR, but it seems to work well only with really high quality scans:
I tried it again because of your question. I took a screenshot of the main text of this page (2x "+") and OCRed it. This is the result:
זָ'*-שִלָלם ב*צָט' םִכ*'ָ חַשָזָת'*
כ' *ל*תַ' עָברוּ רבשָ'
'פַשז כָבִד 'כבדָו פםנִ'י
דבאִ'שו נָ*קו חַבוּרתָ'
*כָ*' זָוַלרָ'*
נ*ו'ת' שדָוהִ' פד-עפָד
כָל--'ֹו* לדר דל-ח'*
כְ'-כָבָלַ' םָלאָו נק?ה
וגָ'ן םהב כּ-'*רִ'י
נִ'ו*ת' ו*ד*'תִ' עד-טפָד
שָאנח' *נהםָח לב* .
אדנָֹ' נָ'דךָ כָ?-חּזְוָחָ'
וִאַנחָת' פִכָךָ לגָ-נםהרָהי
לב' *-רחר ם'ָבנ' כֹח'
וִאור-**'נ'ָ *ב דב א'ן' גָת'*
אְֹהִב' וִרִ*' ם*יד ננפָ' 'ַפְשָ*
*רובַ' פרָחָ* עָםדוי
וַ'נַקשָו םככִ'זִ' נַכָש'
But the best one for mixed characters seems to be tesseract anyway. It's a command-line tool, so you'll probably need a GUI - here are some links. Roi Dayan even has uploaded hebrew training data, but I didn't try them out yet.

edit: Oh...
But I'm also the guy who runs the software here
Well, perhaps you won't need the GUI :)
User avatar
Ken M. Penner
Posts: 83
Joined: Wed Sep 11, 2013 12:31 pm

Re: OCR for Hebrew?

Post by Ken M. Penner »

Here's what ABBYY does with that page Sebastian mentioned, (after 2x "+"), with "Hebrew" set as the language.
אין־שלום בעצמי מפני חטאתי:
בי עונתי עברו ראשי
כמשא כבד יכבדו מטוני:
הבאישו נמקו חבוריתי
• : .יד ־ 1 **
מפני אולתי:
נערתי שהותי עד־מאד
כל־היום ר{דר הלכתי:
כי־כסלי מלאו נקלה
•• .י: ד- ▼ : ו. • '1 •־־*
ואיך מתם בכשרי:
נפוגתי ונל־כיתי עד־מאד
שאגתי מנהמת לבי:
אדני נגךןן כל־תאותי
!אנחתי ממןדלא־נסתרה:
לבי סחךחר עזבני כחי
ואור־עיני גם־הם איךאתי:
אבי,צי ו ורעי מנגד נגעי!עמדו
וקרובי מרחק עמדו:
וינקשו ו מבקשי נפשי
As you can see, ABBYY is vastly superior to QHOCR for the consonantal text.
Ken M. Penner, Ph.D.
St. Francis Xavier University
Post Reply