A desktop tool for OCR, using Tesseract and modern, efficient preprocessing and analysis pipelines. It is optimised for historical printed works, and includes models for a variety of historical scripts.
rescribe 0.5.1 for Windows (2021-08-30)
rescribe 0.5.1 for Mac (2021-08-30)
rescribe 0.5.1 for Mac (M1) (2021-08-30)
rescribe 0.5.1 for Linux (2021-08-30)
Just click to download the tool, and save it to the folder where you want to do your OCR.
Rescribe is a command-line tool, but don't worry, it isn’t hard to use!
Start by opening up a terminal window. If you’re on Windows, you can type
cmd.exe into the run box, on OSX it’s under
Applications → Utilities → Terminal, and if you’re on Linux I bet you already know where to find your terminal.
Firstly, navigate to the folder where you downloaded the tool, by running the
cd command, for example
If you’re on Linux or OSX you will probably need to make the program executable after downloading it, so do that now by running
chmod +x rescribe. You’ll only have to do that once.
You use rescribe by giving it the name of the directory containing the book or manuscript pages you want to OCR. Basic usage looks like this:
This will run rescribe over all pages in the directory mybook. A successful run will add several new files to mybook:
mybook.pdfin the above example), which is fully searchable.
textdirectory, containing plain text versions of the OCR results for each page.
hocrdirectory, containing hOCR formatted OCR results for each page.
graph.pngfile, which shows the OCR confidence of each page (a rough indicator of the quality of the OCR over the book).
conffile, which lists the OCR confidence of each page, at each preprocessing binarisation threshold attempted.
Rescribe contains a set of OCR models built in, and it defaults to one trained specifically for historic printed Latin books. To see the other models available, run
./rescribe by itself, and you will see the list. You can then choose an alternative model by using the
-t flag, for example to use a model trained for Caroline Miniscule manuscripts, you would run:
./rescribe -t carolinemsv1_fast.traineddata mybook
If you have another model you would like to use, you can just put it in the same folder as rescribe and use its file name after
One limitation at the moment is that the rescribe tool is very sensitive
to how page images are named. It will only work on pages named
0001 is any four digit number (and
<anything> is anything!).
Rescribe is published under the GPLv3 license, and source code can be found by cloning its git repository with
git clone https://git.rescribe.xyz/bookpipeline. It's written in Go, and is easy to hack on, if you have any patches or questions, please send them along to firstname.lastname@example.org.
For more information on the inner workings of rescribe, take a look at our blog.