Rescribe

Rescribe Ltd offers custom-tailored Optical Character Recognition (OCR) Services for historical texts.

Our OCR training packages are designed for the Tesseract and OCRopus engines and can be downloaded and used for free from latinocr.org and github.com. The software and tools we create are all released as free and open source software.

We have a blog which contains various guides and articles, written to be useful for humanists, librarians and technologists.

For more information on our commercial service, read on or contact info@rescribe.xyz.

We built a brand new free and open source desktop tool for historic OCR on Mac, Linux and Windows.

Service

Our service is comprised of three steps: Preprocessing and OCR of images or PDFs, and postprocessing. We use bespoke binarization software to convert images to black and white, despeckle them and optionally clear the margins. Our OCR uses the open source engine Tesseract, with bespoke models trained for the respective texts. The postprocessing step includes a basic sanity check to ensure the quality of the output. The final output can be delivered as raw text files, searchable PDFs or hOCR format.

We naturally aim to deliver output of optimum accuracy; however, the quality can depend on a number of factors beyond our control (such as quality of page and scan, fonts and otherwordly characters). A set of analytical tools allows us to quickly assess the relative quality of the output and devote more attention to problem areas. Where higher accuracy is required, the output can be optionally proofread to ensure that requirements are met.

Recent blog posts

Rescribe Desktop Tool v1.0.0 releasedTurning OCR output into great PDFs
2022-03-222021-10-25
Today we’re very happy to announce the v1.0.0 release of the desktop OCR tool Rescribe. It’s free to download now from our Rescribe page, and it will work on Mac OS X, Windows and Linux. This release brings several major new features, a graphical interface, support for reading PDFs directly, and an integrated Google Book downloader chief among them. Read more… Recently we have been putting some effort into improving the PDF output from our tools, which have all made it into the latest release of rescribe (v0.5.1). While they may seem simple, PDFs are a surprisingly complex, sometimes tricky file format to produce correctly, so here we’ll run through some of the ways we get really good PDFs out of our pipeline, and exactly what “good” means in this context. Read more…

Current Projects

Previous Projects

Publications

Tools

We release all of the tools we create as free software under the GPLv3 license. Our recent tools have mostly been written in our favourite language, Go. While our workflow primarily uses a distributed system with virtual servers, all useful functionality, including image preprocessing, postprocessing analysis, and final PDF generation tools are also made available as separate self-contained commands within the packages.

There is a tour of our tools on our blog that describes what they all do, how they fit together, and which tools and libraries are likely to be particularly useful for others.

Videos

Acknowledgements

Rescribe Ltd is a not-for-profit, spin-out company based on research carried out at Durham University. The company formation and initial development of the software have been funded by a Proof-of-Concept grant awarded by the European Research Council, on the basis of research developed as part of the project Living Poets: A New Approach to Ancient Poetry, directed by Prof. Barbara Graziosi and funded by the European Research Council.

European Research Council Durham University