Optical Character Recognition (OCR)
TownsWeb Archiving’s optical character recognition (OCR) scanning service uses leading professional OCR software to identify typed text within digital images and convert it into usable digital text, which can be added to digital archives as metadata and searched against.
OCR is invaluable in making the information within digitised magazine, journal and newspaper collections far more accessible, providing the most efficient and instantaneous retrieval of your valuable content. TownsWeb Archiving are happy to deploy their OCR service during the digitisation phase or for already digitised materials. Following an initial survey and sample we can provide examples of our outputs for your approval.Frequently Asked Questions – Find out more
Our technicians will expertly prepare your digital material to improve the readability of your typed-text. Then, using our specialist software, we scan your content, creating highly accurate OCR results.
We can capture typed text from any of your digitised material, such as books, magazines and newspapers, outputting your OCR results in any format you may require.
We can import your OCR data straight into PastView or your own collections management system. Creating fully searchable metadata to make your archive infinitely more discoverable.
Get in touch and tell us about your requirements for a free, no obligation quote.
We can either digitise your material or you can provide us with your digital assets.
We can interpolate, deskew, reduce background noise and much more to produce the best possible OCR results.
Using our specialist OCR software we scan your digital material, providing the results in any format you require.
We can import your OCR data straight into PastView or into your own system, ready to link to your digital images.
Hear what our clients have to say about our data capture services
Thanks to the digitisation carried out by TownsWeb, we have been able to extend the reach of our collections, for example by using the high quality files on our website. This makes the material available for those that cannot physically get to our museum due to its rural location, but also globally for research and educational purposes
Museum Curator and Project Leader
I’m very pleased that we’ve managed to add Personal Information Sheets, Staff Records and the 19th Century Woolwich History book to our website. It’s very exciting watching our digital archive grow online. We would again like to thank the staff at TownsWeb Archiving for their skill in seamlessly integrating the new content in to the website.
The improvement in searching has been tremendous. The good feedback we get from users about searching the new site has really made the update worthwhile. It has also made us think more about attributes we have associated with items, so as we improve these the search results that users get will also improve.
Optical Character Recognition (OCR) Frequently Asked Questions
Our professional OCR software scans your JPEG and TIFF image collections (often produced via digitisation), recognises typed or printed text within the images, and converts that typed text into machine readable digital text documents.
The OCR’d text can then be added as metadata to a digital archive and associated with the image it was scanned from, to allow keyword searching of the text content, either via collections management software or on a digital archive website.
For example, if you digitised a collection of printed magazines, then put the digital images through the OCR process to extract the article text, this would then allow searching by keyword against the articles’ content.
To see examples of this in action take a look at our PastView digital collections management system.
Using OCR scanning we can capture the full content of digitised items – including books, magazines, newspapers, and diaries.
Our team can also index your digitised files by incorporating metadata created from the OCR within the filenames of the digital images.
Performing our OCR process on typed or printed text can be very accurate. If the text is clear (e.g. the text colour is a strong contrast to background colour), typed in a standard font (e.g. Arial, Times New Roman), and in a standard size (e.g. size 10 upwards); the OCR results are on average 95% accurate. Our OCR software additionally performs pre-process techniques to improve the chances of successful recognition, such as if de-skewing the document if it is not aligned correctly.
Though it should be noted that formatting anomalies, such as tables, can negatively impact the results.
We can produce output files of the OCR data in any format you specify, such as PDF, PDF/A, MS Word, HTML or Rich Text documents.
We can help you import the data into your own system, alternatively we can import the data into our collection management systems, PastView. Take a look at our internet based BookViewing Software, which allows you to link the transcribed data to your digitised images.
If you are interested in publishing your digital collection and OCR data online, find a PastView package here to suit your organisation.
Whilst our OCR service can be very accurate when converting printed type text into digital format, in our experience the accuracy of OCR on handwritten text is very poor. For this reason, in the case of capturing hand written text we recommend using our transcription service.
Calculate the cost of your data capture project
Enter your data capture requirements and tick the box to confirm whether you would like us to get in touch to discuss your requirements further. If you’re not quite ready to use our online quote calculator, feel free to call on 01536 713834 or email us [email protected]
Quote assumptions and specifications
- These costings are based on the assumption that TownsWeb Archiving have digitised the material prior to transcription
- If you are using already digitised files, TownsWeb Archiving will need sight of the images in order to check suitability and agreeing to the work