1. Help Center
  2. Layout Recognition

1. Automatic Layout Recognition

If the page layout of your documents is complex or you are transcribing manually, run the Layout Recognition as a separate step to detect text regions and lines

The Layout Recognition is the segmentation of the image into text regions and lines to connect the text and the image.

The text region is a rectangle, encasing the handwritten text contained in the image/page. 

HC-Layout_1_auto-recogn_region

The line is a polyline, running along the bottom of the handwritten text line, and is the most important reference point for text recognition.

HC-Layout_1_auto-recong_lines

The Layout Recognition is performed automatically when you start a Text Recognition job, but it can also be run as a separate step. There are many reasons to do that: for instance, when you want to use Transkribus to manually transcribe your documents; when you are preparing the transcriptions to train a new model; or when the page layout is complex (a table, for example).

To run the Layout Recognition as a separate step, select the page(s) or the document(s) to process; then click on “Recognize” on toolbar over the preview(s). This will open the recognition menu: switch to "Layout" at the top of the page for managing the layout recognition options.

HC_LA01-Auto_beta

The “Universal Lines” model is selected by default; you just need to click “Start Recognition” to launch the recognition. You can check its progress with the “Jobs” button.

Once finished, open the page and check the automatic layout recognition result on the image, which is now segmented into text regions and lines.


If the automatic Layout Recognition has performed poorly (e.g. has missed some lines or the clustering of lines in text regions is not correct), you can change the advanced configuration settings, as explained on this page

 


 

Transkribus eXpert (deprecated)

Layout Recognition is the segmentation of the image into text regions, lines and baselines to connect the text and the image.

The text region is a rectangle, encasing all of the handwritten text contained in the image/page.  

 

The baseline is a polyline, running along the bottom of the handwritten text line, and is the most important reference point for text recognition.

The lines are regions located within a text region and can be described as polygons, encasing all of the handwritten text in a line. 

The Layout Recognition is performed automatically when you start a Text Recognition job, but it can also be run as a separate step.

To run the Layout Analysis as a separate step, go to the “Tools” tab in the Managing & Tools Bar (on the left side of the screen). The section we are interested in is named “Layout Analysis”.

Transkribus eXpert-Layout Analysis

Select the current page, the pages or the document(s) you want to process and then click "Run" to launch the layout analysis. The Layout Analysis will be performed with the default settings (Horizontal Text Line Orientation model; General region detection method).

To check the progress of the job, click on the “Jobs” button. When the job is finished, reload the page(s) and the text regions, lines and baselines will appear in the Image Window. You can also see the layout structure in the “Layout” tab, in the Managing&Tools Bar. 

If the automatic Layout Recognition has performed poorly (e.g. has missed some lines or the clustering of lines in text regions is not correct), you can change the advanced configuration settings, as explained on this page