Rescribe OCR

Rescribe is a research collective with a focus on Optical Character Recognition (OCR) software and training for historical texts.

Our OCR training packages are designed for the Tesseract and OCRopus engines and can be downloaded and used for free from latinocr.org and github.com. The software and tools we create are all released as free and open source software.

We have a blog which contains various guides and articles, written to be useful for humanists, librarians and technologists.

We built a brand new free and open source desktop tool for historic OCR on Mac, Linux and Windows.

Recent blog posts

Rescribe Desktop Tool v1.2.0 released	Turning OCR output into great PDFs
2024-02-16	2021-10-25
We have a fresh new release of our desktop OCR tool Rescribe out now, v1.2.0. This fixes all known bugs, and adds a nice new feature. We also released v1.1.0 last year, very quietly. As with the previous releases, this works on MacOS, Windows and Linux, and is designed to be easy to install and use. Read more…	Recently we have been putting some effort into improving the PDF output from our tools, which have all made it into the latest release of rescribe (v0.5.1). While they may seem simple, PDFs are a surprisingly complex, sometimes tricky file format to produce correctly, so here we’ll run through some of the ways we get really good PDFs out of our pipeline, and exactly what “good” means in this context. Read more…

Rescribe Desktop Tool v1.2.0 released

Turning OCR output into great PDFs

2024-02-16

2021-10-25

We have a fresh new release of our desktop OCR tool Rescribe out now, v1.2.0. This fixes all known bugs, and adds a nice new feature. We also released v1.1.0 last year, very quietly. As with the previous releases, this works on MacOS, Windows and Linux, and is designed to be easy to install and use. Read more…

Recently we have been putting some effort into improving the PDF output from our tools, which have all made it into the latest release of rescribe (v0.5.1). While they may seem simple, PDFs are a surprisingly complex, sometimes tricky file format to produce correctly, so here we’ll run through some of the ways we get really good PDFs out of our pipeline, and exactly what “good” means in this context. Read more…

Current Projects

Rescribe Open Source Desktop tool, financed by Kickstarter campaign (2021-Present)

Previous Projects

University of Groningen, The normalisation of natural philosophy, ERC Starting Grant project, processing 600 printed books from 17th - 20th Century (2019-2021)
Durham Priory Library, Various manuscripts (2017)
Middle Temple Library, Gabriel Powel's De adiaphoris theses theologicæ ac scholasticæ (2016)

Publications

Reading in the mist: High-quality optical character recognition based on freely available early modern digitized books, Digital Scholarship in the Humanities, Volume 37, Issue 4, December 2022, Pages 1197-1209, https://doi.org/10.1093/llc/fqac014
Modelling Medieval Hands: Practical OCR for Caroline Minuscule, Digital Humanities Quarterly 13.1, 2019
Training Tesseract for Ancient Greek OCR, The Eutypon 28-29, 2012

Tools

We release all of the tools we create as free software under the GPLv3 license. Our recent tools have mostly been written in our favourite language, Go. While our workflow primarily uses a distributed system with virtual servers, all useful functionality, including image preprocessing, postprocessing analysis, and final PDF generation tools are also made available as separate self-contained commands within the packages.

There is a tour of our tools on our blog that describes what they all do, how they fit together, and which tools and libraries are likely to be particularly useful for others.

Rescribe desktop tool - an all-in-one preprocessing, OCR and analysis tool for Mac, Windows and Linux, based on our server bookpipeline package
rescribe.xyz/bookpipeline package - various tools and functions for the OCR of books, with a focus on distributed OCR using short-lived virtual servers
rescribe.xyz/preproc package - various image processing methods which are useful for preprocessing page images
rescribe.xyz/utils package - miscellaneous commands and small packages

Videos

A Scholar's Guide to using Optical Character Recognition, a 5 lesson course on Academia.edu, 2021.
A Pipeline for the Ages: Medieval Manuscript OCR from the comfort of your own home, Lightning Talk for the Schoenberg Symposium 2020.

Contact

We always welcome emails, whether asking for advice, offering feedback, or anything else. info@rescribe.xyz.

Acknowledgements

Rescribe Ltd was originally a not-for-profit, spin-out company based on research carried out at Durham University. The company formation and initial development of the software was funded by a Proof-of-Concept grant awarded by the European Research Council, on the basis of research developed as part of the project Living Poets: A New Approach to Ancient Poetry, directed by Prof. Barbara Graziosi and funded by the European Research Council.