2. Choosing a Model

Previous step: Automatically transcribing your documents

The most important thing for good transcripts is to select a model suitable for your documents. There does not exist a general model for all the handwritings and for the next few years, it is expected that specialised models will remain necessary.

All public models can be found in the Models workspace, which shows an overview of all Models available for you to use.

Find the right model

To find the right model for your material, use the search and filter options.

Search Models: Search for Model Names
Task: Select whether you are looking for a Text, Layout, Field or Table Model
Select a language: Pick the language of the material you are working with
Sort: Decide how the search results are displayed
Filter: Specify whether you are looking for models for handwritten or printed text and from what time period

After applying the search and filter options, the number of suggested models in the model space should be significantly reduced, showing only those that are likely to work well with your material based on your filters and the model descriptions.

When choosing a text model, you need to consider the following:

- the type of material, handwritten or printed;
- the language;
- the period;
- the type of script;
- the Character Error Rate (CER):
  The performance of a model is determined based on the “distance” between a perfect transcription and the automatically recognised text. It is measured by the Character Error Rate, i.e. the percentage of characters that have been transcribed incorrectly by the Text Recognition model. Look at this page to learn more about the Character Error Rate.

Clicking on the models name you can read the description of the model and take a look at the statistics (for example, the number of words, lines and pages the model has been trained on). Here you can also test the model by using the Quick text recognition option.

Most of the models that can be used for Text Recognition have been trained with PyLaia, which is the Handwritten Text Recognition engine currently available within Transkribus. It has been developed by UPVLC (Universitat Politècnica de València) and is open-source.

As transformer-based text recognition models, designed for a broader variety of materials and writings, the Super Models are the exception to the PyLaia-trained models.

Next step: Public Models

2. Choosing a Model

Selecting the best Text Recognition model for your documents is crucial to have good automatic transcripts.

Find the right model