Optical Character Recognition (OCR) Scanning
TownsWeb Archiving’s optical character recognition (OCR) scanning service uses leading professional OCR software to identify typed text within digital images and convert it into usable digital text, which can be added to digital archives as metadata and searched against.
OCR is invaluable in making the information within digitised magazine, journal and newspaper collections far more accessible and making information retrieval much faster.
How does OCR scanning work?
Our professional OCR software scans your JPEG and TIFF image collections (often produced via digitisation), recognises typed or printed text within the images, and converts that typed text into machine readable digital text documents.
The OCR’d text can then be added as metadata to a digital archive and associated with the image it was scanned from, to allow keyword searching of the text content, either via collections management software or on a digital archive website.
For example, if you digitised a collection of printed magazines, then put the digital images through the OCR process to extract the article text, this would then allow searching by keyword against the articles’ content.
To see examples of this in action take a look at our PastView digital collections management system.
What data can be captured?
Using OCR scanning we can capture the full content of digitised items – including books, magazines, newspapers, and diaries.
Our team can also index your digitised files by incorporating metadata created from the OCR within the filenames of the digital images.
High accuracy OCR scanning service
Performing our OCR process on typed or printed text can be very accurate. If the text is clear (e.g. the text colour is a strong contrast to background colour), typed in a standard font (e.g. Arial, Times New Roman), and in a standard size (e.g. size 10 upwards); the OCR results are on average 95% accurate. Our OCR software additionally performs pre-process techniques to improve the chances of successful recognition, such as if de-skewing the document if it is not aligned correctly.
Though it should be noted that formatting anomalies, such as tables, can negatively impact the results.
File Formats and Accessing the Data
We can produce output files of the OCR data in any format you specify, such as PDF, PDF/A, MS Word, HTML or Rich Text documents.
We can help you import the data into your own systems, alternatively we can import the data into our systems. Take a look at our PC based Viewing Software or our Internet based TWA PastView System, which allows you to link the transcribed data to your digitised images.
If you are interested in displaying your data or images online for the public to view then we also offer a web design service and can build a website to showcase your collections.
Can handwritten text be scanned using OCR?
Whilst our OCR service can be very accurate when converting printed type text into digital format, in our experience the accuracy of OCR on handwritten text is very poor. For this reason, in the case of capturing hand written text we recommend using our transcription service.