Optical Character Recognition (OCR) Scanning
Optical Character Recognition (OCR)
TownsWeb Archiving’s optical character recognition (OCR) scanning service uses leading professional OCR software to identify typed text within digital images and convert it into usable digital text, which can be added to digital archives as metadata and searched against.
OCR is invaluable in making the information within digitised magazine, journal and newspaper collections far more accessible, providing the most efficient and instantaneous retrieval of your valuable content. TownsWeb Archiving are happy to deploy their OCR service during the digitisation phase or for already digitised materials. Following an initial survey and sample we can provide examples of our outputs for your approval.
Our technicians will expertly prepare your digital material to improve the readability of your typed-text. Then, using our specialist software, we scan your content, creating highly accurate OCR results.
We can capture typed text from any of your digitised material, such as books, magazines and newspapers, outputting your OCR results in any format you may require.
We can import your OCR data straight into PastView or your own collections management system. Creating fully searchable metadata to make your archive infinitely more discoverable.
Get in touch and tell us about your requirements for a free, no obligation quote.
We can either digitise your material or you can provide us with your digital assets.
We can interpolate, deskew, reduce background noise and much more to produce the best possible OCR results.
Using our specialist OCR software we scan your digital material, providing the results in any format you require.
We can import your OCR data straight into PastView or into your own system, ready to link to your digital images.
Optical Character Recognition (OCR) Frequently Asked Questions
Our professional OCR software scans your JPEG and TIFF image collections (often produced via digitisation), recognises typed or printed text within the images, and converts that typed text into machine readable digital text documents.
The OCR’d text can then be added as metadata to a digital archive and associated with the image it was scanned from, to allow keyword searching of the text content, either via collections management software or on a digital archive website.
For example, if you digitised a collection of printed magazines, then put the digital images through the OCR process to extract the article text, this would then allow searching by keyword against the articles’ content.
To see examples of this in action take a look at our PastView digital collections management system.
Using OCR scanning we can capture the full content of digitised items – including books, magazines, newspapers, and diaries.
Our team can also index your digitised files by incorporating metadata created from the OCR within the filenames of the digital images.
Performing our OCR process on typed or printed text can be very accurate. If the text is clear (e.g. the text colour is a strong contrast to background colour), typed in a standard font (e.g. Arial, Times New Roman), and in a standard size (e.g. size 10 upwards); the OCR results are on average 95% accurate. Our OCR software additionally performs pre-process techniques to improve the chances of successful recognition, such as if de-skewing the document if it is not aligned correctly.
Though it should be noted that formatting anomalies, such as tables, can negatively impact the results.
We can produce output files of the OCR data in any format you specify, such as PDF, PDF/A, MS Word, HTML or Rich Text documents.
We can help you import the data into your own system, alternatively we can import the data into our collection management systems, PastView. Take a look at our internet based BookViewing Software, which allows you to link the transcribed data to your digitised images.
If you are interested in publishing your digital collection and OCR data online, find a PastView package here to suit your organisation.
Whilst our OCR service can be very accurate when converting printed type text into digital format, in our experience the accuracy of OCR on handwritten text is very poor. For this reason, in the case of capturing hand written text we recommend using our transcription service.
Digitise your physical material with TownsWeb Archiving
Digitising your material is the first step in providing improved access to your collections. We have worked with 1000s of organisations throughout the UK, safeguarding their invaluable archives with our digitisation expertise.Learn more about digitisation
Publishing your digitised content with PastView
Once your collection has been digitised you can upload this data to PastView, for ultimate management control, and publish your collection through a purpose built, bespoke PastView website.Learn more about publishing