Selecting the best Text Recognition model for your documents is crucial to have good automatic transcripts
Previous step: Automatically transcribing your documents
The most important thing for good transcripts is to select a model suitable for your documents. There does not exist a general model for all the handwritings and for the next few years, it is expected that specialized models will remain necessary.
When choosing a text model, you need to consider the following:
- the type of material, handwritten or printed;
- the language;
- the period;
- the type of script;
-
the Character Error Rate (CER):
The performance of a model is determined based on the “distance” between a perfect transcription and the automatically recognised text. It is measured by the Character Error Rate, i.e. the percentage of characters that have been transcribed incorrectly by the Text Recognition model. Look at this page to learn more about the Character Error Rate.
All public models can either be found on this page or in the user interface section for Models in the Gallery, which shows an overview of all Models available for you to use.
Clicking on “Show Description,” you can read the description of the model and take a look at the statistics (for example, the number of words, lines and pages the model has been trained on).
Most of the models that can be used for Text Recognition have been trained with PyLaia, which is the Handwritten Text Recognition engine currently available within Transkribus. It has been developed by UPVLC (Universitat Politècnica de València) and is open-source.
As transformer-based text recognition models, designed for a broader variety of materials and writings, the Super Models are the exception to the PyLaia-trained models.
Next step: Public Models
Transkribus eXpert (deprecated)
The most important thing for good transcripts is to select a model suitable for your documents. There does not exist a general model for all the handwritings and for the next few years, it is expected that specialized models will remain necessary.
When you click "Select HTR model", a window opens: on the left side of the window, you can see an overview of the available models; on the top right side of the window, the details of the model are shown.
When choosing a text model, you need to consider the following:
- the type of material, handwritten or printed;
- the language;
- the period;
- the type of script;
- the Character Error Rate.
All the models that can be used for Text Recognition have been trained with PyLaia, which is the Handwritten Text Recognition engine currently available within Transkribus. It has been developed by UPVLC (Universitat Politècnica de València) and is open-source.