2. Choosing a Model
Selecting the best Text Recognition model for your documents is crucial to have good automatic transcripts.
Previous step: Automatically transcribing your documents
The most important thing for good transcripts is to select a model suitable for your documents. There does not exist a general model for all the handwritings and for the next few years, it is expected that specialised models will remain necessary.
All public models can be found in the Models workspace, which shows an overview of all Models available for you to use.
Find the right model
To find the right model for your material, use the search and filter options.
- Search Models: Search for Model Names
- Model Type: Select whether you are looking for a Text, Layout, Field or Table Model
- Select a language: Pick the language of the material you are working with
- Sort: Decide how the search results are displayed
- Filter: Specify whether you are looking for models for handwritten or printed text and from what time period
After applying the search and filter options, the number of suggested models in the model space should be significantly reduced, showing only those that are likely to work well with your material based on your filters and the model descriptions.
When choosing a text model, you need to consider the following:
-
- the type of material, handwritten or printed;
- the language;
- the period;
- the type of script;
- the Character Error Rate (CER):
The performance of a model is determined based on the “distance” between a perfect transcription and the automatically recognised text. It is measured by the Character Error Rate, i.e. the percentage of characters that have been transcribed incorrectly by the Text Recognition model. Look at this page to learn more about the Character Error Rate.
Clicking on “Show Description,” you can read the description of the model and take a look at the statistics (for example, the number of words, lines and pages the model has been trained on). Here you can also test the model by using the Quick text recognition option.
Most of the models that can be used for Text Recognition have been trained with PyLaia, which is the Handwritten Text Recognition engine currently available within Transkribus. It has been developed by UPVLC (Universitat Politècnica de València) and is open-source.
As transformer-based text recognition models, designed for a broader variety of materials and writings, the Super Models are the exception to the PyLaia-trained models.
Next step: Public Models
Transkribus eXpert (deprecated)
The most important thing for good transcripts is to select a model suitable for your documents. There does not exist a general model for all the handwritings and for the next few years, it is expected that specialized models will remain necessary.
When you click "Select HTR model", a window opens: on the left side of the window, you can see an overview of the available models; on the top right side of the window, the details of the model are shown.
When choosing a text model, you need to consider the following:
- the type of material, handwritten or printed;
- the language;
- the period;
- the type of script;
- the Character Error Rate.
All the models that can be used for Text Recognition have been trained with PyLaia, which is the Handwritten Text Recognition engine currently available within Transkribus. It has been developed by UPVLC (Universitat Politècnica de València) and is open-source.