Languages and scripts for Rescribe OCR
The Rescribe OCR tool includes several different languages and scripts, but there are many more freely available which can be downloaded and used with it.
To use one of the below files, download it onto your computer, then click on the "Other..." option in the "Language / Script" box in Rescribe. Then you can find and select the file you downloaded.
Beta models
These are models we have created, but have not yet had enough testing, or haven't been trained on a wide enough sample of ground truth, to be released with the Rescribe tool.
Tesseract models
These are all training files produced by Tesseract OCR, the OCR engine that Rescribe uses. Any other modern training files for Tesseract you find online should work fine with Rescribe too.
- Afrikaans
- Amharic
- Albanian
- Arabic
- Armenian
- Assamese
- Azeri
- Azeri, Cyrillic
- Basque
- Belarusian
- Bengali
- Bosnian
- Breton
- Bulgarian
- Burmese
- Catalan
- Cebuano
- Central Khmer, Khmer
- Central Tibetan
- Cherokee
- Chinese, Simplified
- Chinese, Simplified vertical
- Chinese, Traditional
- Chinese, Traditional vertical
- Corsican
- Croatian
- Czech
- Danish
- German
- Dhivehi, Maldivian
- Dutch, Flemish
- Dzongkha
- Greek, Modern
- English, Modern
- English, Middle
- Esperanto
- Estonian
- Faroese
- Filipino
- Finnish
- Frankish
- French, Middle
- French, Modern
- Frisian, Western
- Gaelic
- Galician
- Greek, Ancient
- Gujarati
- Haitian, Haitian Creole
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Inuktitut
- Indonesian
- Irish
- Italian
- Japanese
- Javanese
- Japanese vertical
- Kannada
- Georgian
- Kazakh
- Kirghiz, Kyrgyz
- Korean
- Korean vertical
- Kurdish, Northern
- Lao
- Latin
- Latvian
- Lithuanian
- Letzeburgesch, Luxembourgish
- Macedonian
- Malayalam
- Maltese
- Mongolian
- Maori
- Malay
- Marathi
- Moldavian, Moldovan, Romanian
- Nepali
- Norwegian
- Occitan (post 1500)
- Oriya
- Panjabi, Punjabi
- Persian
- Polish
- Portuguese
- Pashto/Pushto
- Quechua
- Russian
- Sanskrit
- Sindhi
- Sinhala, Sinhalese
- Slovak
- Slovenian
- Spanish
- Serbian
- Serbian Latin script
- Sundanese
- Swahili
- Swedish
- Syriac
- Tajik
- Tamil
- Tatar
- Telugu
- Thai
- Tigrinya
- Tonga
- Turkish
- Uighur
- Ukrainian
- Urdu
- Uzbek
- Uzbek, Cyrillic
- Vietnamese
- Welsh
- Yiddish
- Yoruba
License
These training files are all released by the Tesseract OCR project and are licensed under the Apache 2.0 license.