Entering the Fourth Dimension of OCR with Tesseract

04.11.2020, 12:00-13:00
Talk, EN

Optical Character Recognition has come a long way since the first image-scanning inventions in the early 1900s. Nowadays, accuracy rates of over 90% are easily achievable on high-quality text scans. Many OCR engines capable of reaching these rates exist today; one of which is Tesseract.

Tesseract has become quite popular amongst software developers because of its accuracy, its open-source status and its active development by Google. By using the Tess4J JNA wrapper it is easily integrated into your Java project.

During this session, I will introduce Tesseract, its pros and cons and how & when to use it. And I will demonstrate a Java application that uses Tesseract and Tess4J to process some example documents, so you’ll be able to assess its accuracy for yourself.

In geometry, a ‘tesseract’ is the four-dimensional analog of a cube. So will the Tesseract OCR library live up to its name and help your project to ‘enter the fourth dimension’? Join me for this session and find out for yourself!