Rescribe Ltd offers custom-tailored Optical Character Recognition (OCR) Services for early modern Latin texts. Our OCR training packages are adapted to the Tesseract engine and can be downloaded and used for free from latinocr.org. For more information on our commercial service, read on or contact firstname.lastname@example.org.
Our service is comprised of three steps: Preprocessing and OCR of images or PDFs and postprocessing the resulting output. Upon request, the output can be proofread for maximum accuracy.
Preprocessing the scans or image files serves to deskew pages, reduce noise in the scans and convert the result into a binarized image which can be processed by Tesseract, our OCR engine.
In the second step we run the OCR on the preprocessed files, using our specifically trained packages and adapting language and character settings to the document at hand.
The resulting output is further refined in an automated postprocessing step: special characters and ligatures are optionally expanded and adapted to modern typography standards (e.g. æ to ae). These settings can be adjusted to the customer’s preferences. The final output can be delivered as raw text files, searchable PDFs or hOCR format.
We naturally aim to deliver an output of optimum accuracy; however, the quality can depend on a number of factors beyond our control (such as quality of page and scan, fonts and outerwordly characters). A set of analytical tools allows us to quickly assess the relative quality of the output and devote more attention to problem areas. Where a higher accuracy is required, the output can be optionally proofread to ensure that requirements are met.
We charge an hourly rate for our services which allows for combining different steps of the process to suit both needs and budget. Whilst Preprocessing and OCR is always needed, the manual proofreading is optional and charged at a lesser rate. For an estimate, contact us at email@example.com with a page sample, a description of the document to be processed and your requirements of the output.
Rescribe Ltd is a not-for-profit, spin-out company based on research carried out at Durham University. The company formation and initial development of the software have been funded by a Proof-of-Concept grant awarded by the European Research Council, on the basis of research developed as part of the project Living Poets: A New Approach to Ancient Poetry, directed by Prof. Barbara Graziosi and funded by the European Research Council.