2. Training Text Recognition Models

A Text Recognition model is an AI algorithm trained on a certain number of data (images and transcriptions), able to detect the most probable sequence of characters for each segmented text line.

Many public models, trained by the Transkribus community, are already available and can be used by every Transkribus user, as explained on this page.

However, if no public model works well on your documents, you can train a customised Text Recognition model to recognise your documents’ specific script. Text Recognition models can be trained for any language and script from any time and place.

By being shown images of documents and their accurate transcriptions, the model will learn to recognise the writing style of your documents with a certain degree of accuracy.

Follow the steps on the next pages to train a powerful text recognition model!

To enhance the accuracy of your model's recognition, it's essential to create fresh training data (Ground Truth pages) and initiate a new training job. Simply correcting the recognised text in the Editor does not improve the model's performance.

Next step: Data Preparation

2. Training Text Recognition Models

The Transkribus platform allows users to train Text Recognition models to transcribe their documents automatically

Follow the steps on the next pages to train a powerful text recognition model!