Hanno Embregts

Entering the Fourth Dimension of OCR with Tesseract

Java

04.11.2020, 12:00-13:00

Talk, EN

Optical Character Recognition has come a long way since the first image-scanning inventions in the early 1900s. Nowadays, accuracy rates of over 90% are easily achievable on high-quality text scans. Many OCR engines capable of reaching these rates exist today; one of which is Tesseract.

Tesseract has become quite popular amongst software developers because of its accuracy, its open-source status and its active development by Google. By using the Tess4J JNA wrapper it is easily integrated into your Java project.

During this session, I will introduce Tesseract, its pros and cons and how & when to use it. And I will demonstrate a Java application that uses Tesseract and Tess4J to process some example documents, so you’ll be able to assess its accuracy for yourself.

In geometry, a ‘tesseract’ is the four-dimensional analog of a cube. So will the Tesseract OCR library live up to its name and help your project to ‘enter the fourth dimension’? Join me for this session and find out for yourself!

Hanno Embregts

Entering the Fourth Dimension of OCR with Tesseract

You have Successfully Subscribed!